feat: Cedar HITL approval gates for agent tool use by scoropeza · Pull Request #88 · aws-samples/sample-autonomous-cloud-coding-agents

scoropeza · 2026-05-13T22:44:21Z

Summary

Adds a Cedar-policy-driven Human-In-The-Loop (HITL) approval gate around every agent tool call. Most calls continue to resolve as a plain Allow/Deny with no human involvement. For a small, explicitly-marked set of policy rules (@tier("soft")), the decision becomes require-approval: the agent pauses, the task transitions to AWAITING_APPROVAL, a live-stream marker is emitted, and the run resumes (or adapts to a denial reason) after the user decides via REST or CLI.

Design reference: docs/design/CEDAR_HITL_GATES.md.

What's new (user-visible)

Three-outcome policy engine layered on top of the existing binary Cedar engine, with an in-process recent-decision cache and a pre-approval allowlist.
REST endpoints: POST /v1/tasks/{id}/approve, POST /v1/tasks/{id}/deny, GET /v1/tasks/pending, GET /v1/tasks/policies.
CLI: bgagent pending | approve | deny | policies list|show, plus bgagent submit --approval-timeout and --pre-approve for unattended runs.
Task lifecycle: new AWAITING_APPROVAL status with atomic RUNNING ↔ AWAITING_APPROVAL transitions on the task + approvals tables.
Observability: new approval_requested, approval_recorded, and approval_timed_out audit events; dashboard widget for pending/decided counts; EMF-backed alarm on pending-age p95.
Blueprint authoring: security.cedarPolicies in blueprint.yaml now accepts per-repo hard/soft rules plus disable:, maxPreApprovalScope, and approvalGateCap knobs.

User-facing documentation

New Approval gates (Cedar HITL) section in the user guide (concept, 4 CLI commands, scope reference, pre-approval flow).
New Step 7 — See an approval gate in action end-to-end walkthrough in the quick start.
New Cedar policy guide for blueprint authors (vocabulary, annotations, patterns, parity testing).
AWAITING_APPROVAL, new events, and the two new submit flags threaded through the existing lifecycle/submit docs.

Safety posture

Fail-closed on every infrastructure failure (DDB unavailable, hook crash, parse error, container restart). Never a silent Allow.
Hard-deny rules are absolute — --pre-approve all_session and every other scope form cannot bypass them. Blueprint loader rejects any disable: directive targeting a built-in hard-deny rule.
Per-task gate cap (default 50, max 500) is a terminal circuit breaker for runaway policy matches.
Cross-engine parity contract (contracts/cedar-parity/): every rule change is exercised against both the Python cedarpy engine and the TypeScript cedar-wasm engine; divergence fails CI.
Approvals are per-user-scoped — the REST handlers enforce that only the task's owner (Cognito sub) can approve/deny its gates.

Notable merge surgery

This branch contains two upstream/main merges to stay current during review. Worth flagging for reviewers:

First merge (7019663) — integrated the Slack + Linear integration work:

Retired the HEAD-side link-slack-user.ts handler + SlackUserMappingTable schema in favor of upstream's richer Slack integration (OAuth, slash commands, interactions, notify). HEAD's mapping table had no reader; nothing functional was lost. A follow-up issue will extend upstream's slack-interactions.ts with approve_action:/deny_action: block_actions for Slack-button approvals.
Pinned cedarpy==4.8.0 exact in agent/pyproject.toml (not upstream's >=4.8.1) to preserve the @cedar-policy/[email protected] parity contract documented in mise.toml.
Pinned astro==6.1.10 exact in docs/package.json to close GHSA-xr5h-phrj-8vxv without pulling in transitive CVEs at 6.3.x.
TaskApprovalsTable construct id renamed to TaskApprovalsTableV2 (load-bearing) — the original logical id was abandoned mid-development after the first ship of the user_id-status-index GSI; adding matching_rule_ids to the projection required a destructive recreate (DDB rejects in-place nonKeyAttributes edits).

Second merge (d3791d6) — integrated 3 more upstream commits that landed while the PR was open:

ff79c24 astro 6.1.6 → 6.1.10 bump (upstream caught up to our exact pin; kept ours to block drift to 6.3.2 + transitive CVEs).
a59ca35 github:* resource tagging strategy (new ArnFormat import in agent.ts).
9592796 SlackNotifyFn migrated to FanOutConsumer subscriber (11 files, ~2000 lines upstream refactor — moves Slack delivery off its own DDB stream consumer onto FanOutConsumer, dropping TaskEventsTable from 2 to 1 concurrent stream readers). The only adjustments on our side were updating two drift-guard tests in fanout-task-events.test.ts to reflect the merged CHANNEL_DEFAULTS.slack contents (our approval_requested/approval_stranded + upstream's session_started). Our Cedar HITL surface is otherwise untouched.

Test plan

All suites exercised end-to-end (local + deployed stack on account 169728770098, us-east-1):

agent pytest — 756 passing (1 local-env-only failure blocked by Amazon git-defender; unrelated to this change).
cli jest — 238 passing across 22 suites.
cdk jest — 1526 passing across 85 suites (up from 1418 pre-second-merge: +108 new tests from upstream's fanout/slack-notify refactor).
Cross-engine Cedar parity (contracts/cedar-parity/) — every fixture passes on both engines.
Full pre-commit + pre-push hooks (lint, format, type check, docs sync, gitleaks, OSV deps, semgrep, grype, retire, gh-actions zizmor, agent security) — all green, zero CVEs.
Live E2E on backgroundagent-dev: triggered soft-deny gates on force-push/env-write, approved with multiple scopes, denied with reason-injection, pre-approved scopes at submit, observed accurate timeout→denial transitions, verified dashboard widget + EMF alarms.

Deferred / follow-up

Filed as upstream issues after merge (per our issue-filing discipline):

Slack-button approvals — extend upstream's slack-interactions.ts with approve_action: / deny_action: action_id prefixes + add approval-request Block Kit renderer.
Shared APPROVAL_GATE_CAP_MIN/MAX constants — currently triplicated across agent/src/policy.py, cdk/src/handlers/shared/types.ts, and cdk/src/constructs/blueprint.ts.
Fan-out + Publisher DLQ alarms + EMF 100-updates/sec throttle — both deferred pending notification-channel decision.
LogQueryWidget → EMF migration — long-term observability cleanup.
Prod migration runbook for DDB GSI projection changes (currently dev-only via construct-id rename).

…ial findings Design doc was accidentally removed in 0742ebe; restored from b34d7cd and substantially revised under a new filename. "Phase 3" framing dropped — this is the Cedar HITL approval gates feature. - Renamed PHASE3_CEDAR_HITL.md → CEDAR_HITL_GATES.md; all "phase" gating removed (Phase 3a/3b → v1 / future work §17). - Integrated 16 findings from 2026-05-06 adversarial review with realistic scenarios. Major structural changes: - Decision aws-samples#23 (new): cross-engine parity contract between cedarpy (agent, Python) and @cedar-policy/[email protected] (Lambda, TS). - §11.2: SlackUserMappingTable with OAuth user-initiated mapping; severity- gated Slack approvals; admin has no write path. - §7.1/§12.3: ApproveTaskFn uses cross-table TransactWriteItems for atomicity. - §10.1: user_id-status-index GSI on TaskApprovalsTable; v1 not v-later. - §15.6: cedar-wasm as a Lambda layer shared across policy Lambdas. - Gate-cap revision (2026-05-07): decision aws-samples#13 — default 50, blueprint- configurable via security.approvalGateCap (bounded 1–500), persisted on TaskTable. Cache memory bound decoupled: 50-entry LRU regardless of cap. IMPL-22 adds telemetry-driven re-evaluation criteria. - Timeout adversarial+advocate pass (2026-05-07): - §6.5 VM-throttle race fix: re-read row on failed TIMED_OUT ConditionCheckFailed; honor APPROVED if user beat the timer. IMPL-24. - Sub-120s @approval_timeout_s emits blueprint-load WARN. IMPL-25. - User-visible timeout cap milestones (approval_timeout_capped_at_submit, approval_ceiling_shrinking). IMPL-26. - Runtime JWT: no refresh logic in agent/src/ (container uses IAM role); ceiling stays min(1h, maxLifetime_remaining - 120s). IMPL-27. - Three new CloudWatch metrics for timeout tuning. IMPL-28. - §14.8 new: off-hours trade-off section (fail-closed is the invariant). - §13.13 new: notification-delivery failure does NOT pause the timer (bypass-prevention). - Added six mermaid diagrams: three-outcome decision flow, end-to-end round- trip, TaskApprovalsTable state machine, Slack user-mapping, fail-closed decision flow, cross-engine parity check. - Cross-references updated in INTERACTIVE_AGENTS.md and SECURITY.md. - Starlight mirror regenerated via docs/scripts/sync-starlight.mjs. No code changes in this commit — design work only. Implementation lands in a follow-up PR per §15.2 task list.

…ract Chunk 1 of the Cedar HITL gates PR (docs/design/CEDAR_HITL_GATES.md). Lays the foundation before engine rewrites in Chunk 2+: both Cedar engines pinned exactly per decision aws-samples#23, annotation surface validated by Day-1 spikes per decision aws-samples#22, and the golden-file parity fixtures seeded so every subsequent chunk can rely on the contract. - Pin cedarpy==4.8.0 (agent) and @cedar-policy/[email protected] (cdk) exactly (no ^/~); document both in mise.toml header. - Add agent/tests/test_cedarpy_annotations_contract.py (10 tests) validating all 5 annotations round-trip verbatim via policies_to_json_str() under staticPolicies.<id>.annotations. - Add cdk/test/handlers/shared/cedar-policy.test.ts (12 tests) validating policySetTextToParts + policyToJson extract the same annotations verbatim and isAuthorized returns the documented {type, response} wrapper shape. - Add contracts/cedar-parity/ with 5 golden-file fixtures (single-match, multi-match, hard-deny, soft-deny write, no-match default-allow) + README documenting the contract. Every fixture policy carries a @rule_id - including the base permit as @rule_id("base_permit") - so the parity tests raise if either engine returns an unannotated match instead of silently dropping it. - Add agent/tests/test_cedar_parity.py (6 tests, cedarpy side) and cdk/test/handlers/shared/cedar-parity.test.ts (6 tests, cedar-wasm side) loading the shared fixtures and asserting (decision, sorted rule_ids) match expected. Both tests hard-import cedarpy/cedar-wasm so a dependency regression fails loud rather than silently skipping. - Update docs/design/CEDAR_HITL_GATES.md sections 15.2 row 3, 15.6 prose and the parity mermaid diagram to point at contracts/cedar-parity/ (the precedent set by contracts/memory-hash-vectors.json) instead of a new tests/fixtures/ dir. Regenerate the Starlight mirror. - Add IMPL-29 noting the cedarpy diagnostics.reasons / cedar-wasm diagnostics.reason naming asymmetry surfaced by the spikes; engine code normalizes at the boundary. - Fix rev-4 -> rev-5 cosmetic footer drift. Test counts: agent 500 -> 516 (+16), cdk 1036 -> 1054 (+18), cli 190 unchanged. No production code changes in this chunk; engine rewrite lands in Chunk 2. Follow-up: separate chore issue to move contracts/memory-hash-vectors.json into a self-named subdir for consistency with contracts/cedar-parity/.

@Severity

Chunk 2 of the Cedar HITL gates PR. Rewrites agent/src/policy.py into the three-outcome engine specified in docs/design/CEDAR_HITL_GATES.md section 6. The REQUIRE_APPROVAL outcome is the human-in-the-loop surface the next chunks (PreToolUse hook extension, REST API, CLI) plug into. This chunk ships the engine and its load-time validation; no hook or wire-format changes yet. Engine: - Outcome enum (ALLOW, DENY, REQUIRE_APPROVAL) + extended PolicyDecision with .allowed backward-compat shim for Phase 1a/1b/2 callers. Custom __init__ accepts both outcome= and legacy allowed= kwargs so existing tests keep working verbatim. - Three-outcome pipeline per section 6.2: hard-deny eval (absolute) -> allowlist fast-path (tool_type/tool_group/bash_pattern/write_path/ all_session) -> recent-decision cache (60s TTL on DENIED/TIMED_OUT) -> soft-deny eval (with post-eval rule-scope allowlist check and blueprint_disable filtering) -> default ALLOW. - ApprovalAllowlist (section 6.4): parses and matches every scope type. Strips whitespace and rejects empty-after-strip values so "tool_type: Read " normalizes instead of silently mismatching (review finding 6). - RecentDecisionCache (section 12.9): 50-entry LRU, INDEPENDENT of approvalGateCap. Populated only on DENIED/TIMED_OUT. Session-scoped (documented section 12.8 caveat). - Annotation handling (sections 5.2 + 6.3): parses @rule_id, @tier, @approval_timeout_s, @Severity, @category via cedarpy.policies_to_json_str(); merges on multi-match with min timeout (clamped by 30s floor) and max severity. - Load-time validation (sections 5.1, 12.4): rejects missing/mismatched @tier, missing @rule_id, sub-floor timeouts, duplicate rule_ids across tiers, blueprint text > 64 KB, disable entries naming built-in hard-deny rules (finding 9), approval_gate_cap outside [1, 500] (decision 13). Sub-120s @approval_timeout_s emits WARN but accepts (IMPL-25). - Fail-closed posture (section 13): cedarpy parse errors surface via diagnostics.errors -> RuntimeError raised inside _eval_tier -> outer handler returns DENY with reason "fail-closed: <ExceptionType>". TypeError on json.dumps of unhashable tool_input surfaces as distinct "fail-closed: unhashable_tool_input" reason (review finding 5). Built-in policies: - agent/policies/hard_deny.cedar: base_permit catch-all + rm_slash + write_git_internals + write_git_internals_nested + drop_table + pr_review-specific Write/Edit forbids (absolute). - agent/policies/soft_deny.cedar: base_permit (catch-all required in each tier so cedarpy default-deny does not convert no-match into DENY) + force_push_any + force_push_main + push_to_protected_branch + write_env_files + write_credentials. All soft rules carry @tier, @rule_id, @approval_timeout_s, @Severity, @category per section 15.4 starter set. Review findings addressed (1 blocker, 8 significant, plus minor): - blueprint_disable actually disables soft rules at eval time instead of silently no-op (the blocker: test coverage had been a silent-pass). - Legacy extra_policies with @tier/@rule_id rejected to avoid undefined double-annotation behavior. - _matching_rule_ids logs WARN on unknown policy IDs (state-drift signal). - base_permit validator exemption restricted to effect=="permit" so misnamed forbid rules cannot bypass validation (finding 7). - Hard-tier Cedar no_decision logged at WARN (signals missing/malformed base_permit catch-all). - Allowlist whitespace normalization + empty-value rejection. - StrEnum upgrade, Callable moved to TYPE_CHECKING, assert replaced with explicit RuntimeError for S101 compliance. Phase 1 compatibility: - All 39 existing test_policy.py tests pass unchanged via the .allowed property. One test (test_invalid_policy_syntax_fails_closed) updated to patch _hard_policies instead of the removed _policies attribute; docstring explains the rewrite. - extra_policies kwarg preserved; callers with annotated rules must migrate to blueprint_soft_policies / blueprint_hard_policies. Test counts: agent 516 -> 576 (+60: 51 three-outcome + 9 regression fixes). cli 190 unchanged. cdk 1054 unchanged. Carry-forward to Chunk 3: - extra_policies semantic shift (Phase 1 DENY -> Chunk 2 REQUIRE_APPROVAL); .allowed=False preserved but .outcome differs. Switchover happens when hooks.py adopts the three-outcome branching. - Cross-tier action-context asymmetry (review finding 8): document rule-authoring constraint in section 5.5 of design. - Probe entity-shape coverage (finding 10): extend _probe_cedar to exercise Write/Edit/Bash action paths, not just invoke_tool.

Adds the 14 agent-side approval milestone writers (§11.1) on ``_ProgressWriter`` so Chunk 3's hook integration has a typed API instead of stringly-typed ``write_agent_milestone`` calls, and the per-task gate counter / per-container sliding-window rate limit / denial-injection queue on ``PolicyEngine`` that §6.5 requires. Why now: the hook work lands cleanly only after these surfaces exist — every code path in ``pre_tool_use_hook``'s REQUIRE_APPROVAL branch calls one of these helpers. Shipping them separately lets the hook commit be about the state machine, not the event-shape bookkeeping. Engine additions: - ``approval_gate_count`` / ``increment_approval_gate_count``: the per-task counter §12.9 bounds at ``approvalGateCap``. Session-scoped in v1; persistence tracked in §17. - ``approvals_in_last_minute`` / ``record_approval_gate_timestamp``: sliding-window rate limit (20/min/container, §12.9). Prune on read so callers see the current count without a separate tick. - ``queue_denial_injection`` / ``drain_denial_injections``: queue consumed by ``_denial_between_turns_hook`` at the next Stop seam (§6.5). Reason is pre-sanitized upstream by ``DenyTaskFn``. - ``mark_ceiling_shrinking_emitted``: emit-once latch for IMPL-26. - ``APPROVAL_RATE_LIMIT`` / ``APPROVAL_RATE_WINDOW_S`` module consts the hook imports rather than re-deriving. Milestone writers (§11.1 table, 14 agent-emitted of 15): - ``pre_approvals_loaded``, ``approval_requested``, ``approval_granted``, ``approval_denied``, ``approval_timed_out``, ``approval_stranded``, ``approval_write_failed``, ``approval_resume_failed``, ``approval_poll_degraded``, ``approval_timeout_capped``, ``approval_ceiling_shrinking``, ``approval_cap_exceeded``, ``approval_rate_limit_exceeded``, ``approval_late_win``. - ``approval_decision_recorded`` (Lambda audit) and ``approval_timeout_capped_at_submit`` (CreateTaskFn) stay on the Lambda side — Chunk 5 owns those. Each helper is a thin wrapper over ``_put_event("agent_milestone", ...)`` so the shared circuit-breaker + classifier path (finding aws-samples#6/aws-samples#8) continues to apply. Metadata keys mirror the §11.1 shapes verbatim (``maxLifetime_remaining_s`` preserves the design-doc spelling for downstream parsers). Tests: +29 total. 17 on ``TestApprovalMilestoneHelpers`` pin the DDB payload shape for each helper (including the two ``approval_timeout_capped`` reason variants — rule_annotation carries matching_rule_ids; maxLifetime_ceiling omits the field). 12 on the engine: counter monotonicity, rate-window prune semantics at window boundary, denial-queue FIFO + drain-clears, ceiling-shrinking latch idempotency. No caller changes — engine and writer surfaces are additive. Hook integration lands in commit C.

…ives Adds the four agent-side DDB primitives §6.5 + IMPL-24 need for the three-outcome hook integration in the next commit: - ``transact_write_approval_request`` — cross-table TransactWriteItems: Put(TaskApprovalsTable) with ``attribute_not_exists(request_id)`` + Update(TaskTable) gated on ``status = RUNNING``. Atomic per §12.3 so a concurrent cancel cannot land the task in AWAITING_APPROVAL with no matching approval row (or vice versa). - ``transact_resume_from_approval`` — Update(TaskTable) gated on ``status = AWAITING_APPROVAL AND awaiting_approval_request_id = :rid``. The ``request_id`` condition prevents resuming with a stale ID after a reconciler race (§13.9). - ``best_effort_update_approval_status`` — conditional UpdateItem on the approval row with ``status = :pending`` guard. Returns False on ``ConditionalCheckFailedException``; this is the signal IMPL-24's re-read path fires on (§6.5 pseudocode lines 846-879, §13.12). - ``get_approval_row`` — GetItem with ``ConsistentRead=True`` by default. Required by IMPL-24's re-read; kept opt-out (bool flag) for future cold-path callers that don't need the strong read. Errors: - ``ApprovalTablesUnavailable`` for env-var-missing — raised loud so a pre-Chunk-4 deploy fails closed (hook will map to DENY) rather than silently no-op'ing the gate. - ``ApprovalWriteError`` / ``ApprovalResumeError`` wrap ``TransactionCanceledException`` with the cancellation reasons list. The hook uses these to distinguish the "concurrent cancel" branch from real DDB outages. - ``ConditionalCheckFailedException`` on ``update_item`` is consumed and returned as ``False`` from ``best_effort_update_approval_status`` — the caller (hook) needs the boolean to decide whether to re-read, not to propagate. - All other DDB errors propagate so the hook's outer try/except can classify fail-closed with a specific reason. Implementation notes: - Uses ``boto3.client("dynamodb")`` low-level API (not resource). ``transact_write_items`` lives on the client, and marshalling the approval row attributes explicitly gives deterministic DDB shapes that the tests can assert on. ``_py_to_ddb_attr`` covers the subset of Python types §10.1 actually uses (str/int/bool/None/list of str); any other type raises TypeError loudly rather than silently writing something unexpected. - ``_extract_error_code`` / ``_extract_cancellation_reasons`` duck-type on ``exc.response`` so we don't need botocore at import time (tests use a minimal exception class). - Errors from unsupported types (floats, dicts, etc.) are caught BEFORE the DDB round-trip so the unit-test asserts ``transact_write_items`` was not called — catches schema drift early. - Status constants (``_STATUS_RUNNING`` / ``_STATUS_AWAITING_APPROVAL``) named so a rename in CDK cannot silently diverge the Python path. Tests: +20 total. - 5 on TransactWriteApprovalRequest: env-missing, happy-path shape assertion (both items + conditions), TransactionCanceled → ApprovalWriteError with reasons preserved, other errors propagate, unsupported type rejected before any DDB call. - 3 on TransactResumeFromApproval: env-missing, happy-path expression shape (includes REMOVE awaiting_approval_request_id), cancel → ApprovalResumeError. - 4 on BestEffortUpdateApprovalStatus: happy path returns True, ``reason`` kwarg attaches ``deny_reason``, ConditionalCheckFailed returns False (IMPL-24's signal), other errors propagate. - 4 on GetApprovalRow: ConsistentRead default True, opt-out False, row-not-found returns None, row unmarshalling through every supported DDB attribute type. - 4 on helpers: error-code extraction with and without ClientError-shape, cancellation-reasons extraction with and without. No runtime callers yet — hook integration lands in commit C. Physical TaskApprovalsTable lands in Chunk 4; Python side is wire-compatible so the hook work can be unit-tested today with mocked clients.

Wires the agent to the full §6.5 pseudocode: cap + rate-limit check, atomic TransactWriteItems for pending row + TaskTable AWAITING_APPROVAL, 2s→5s ConsistentRead poll, IMPL-24 VM-throttle race re-read, resume transition, scope propagation to allowlist, and denial-injection queue consumed at the next Stop seam. Completes §15.2 rows 26 + 27. Hook control flow (three outcomes) ---------------------------------- - ALLOW / DENY: existing Phase 1 behavior, now switching on ``.outcome`` rather than ``.allowed``. Legacy Phase 1/2 tests still green because PolicyDecision preserves the ``.allowed`` shim. - REQUIRE_APPROVAL (new): extracted into ``_handle_require_approval`` for readability. Delegates to ``task_state`` primitives and ``engine.*`` counter surfaces from the prior commits; no new DDB client construction here. Key pieces: - ``_compute_effective_timeout`` applies the §6.5 min(rule, default, lifetime) formula. The engine's ``_merge_annotations`` has already clipped decision.timeout_s against the task default; the hook adds the remaining-lifetime ceiling and floors at FLOOR_30S. ``clip_reason`` distinguishes ``rule_annotation`` (rule was tighter than task default) from ``maxLifetime_ceiling`` (task is late in its life) so ``approval_timeout_capped`` carries the right reason. - ``_remaining_maxlifetime_s`` reads ``AGENTCORE_MAX_LIFETIME_S`` + ``TASK_STARTED_AT`` env vars (8h default). Returns ``None`` when the start timestamp is absent — the hook treats that as "unknown, don't clip" rather than pre-DENYing, so Phase 1 test paths that don't set the env var still see the old task-default behaviour. Chunk 4/5 will wire these at task launch. - ``_poll_for_decision`` uses 2s cadence for the first 30s then 5s (IMPL-12). All polls use ``ConsistentRead=True`` per IMPL-24. 3 consecutive GetItem failures emit ``approval_poll_degraded``; 10 consecutive failures fall through as TIMED_OUT with a specific reason (§13.2). - ``_reconcile_late_decision`` implements IMPL-24 re-read: on a ConditionCheckFailed from the TIMED_OUT write, re-read with ConsistentRead. APPROVED → rebuild outcome, propagate scope to allowlist, run normal allow flow, emit ``approval_late_win``. DENIED → honor the user's sanitized reason. PENDING or row gone → fall through with TIMED_OUT (fail-closed, §13.12 last paragraph). Cancel-wins semantics (finding aws-samples#2) ---------------------------------- ``_denial_between_turns_hook`` is registered AFTER ``_nudge_between_turns_hook`` in ``between_turns_hooks`` so cancel short-circuits both. The hook re-checks ``_cancel_requested`` itself as belt-and-braces (matching the nudge hook) so a future reorder does not silently break cancel-wins. Denial queue is PRESERVED on cancel — not drained — so a denial still sitting on the queue when the task is being torn down does not leak across tasks (the engine is per-task per §IMPL-7). ``stop_hook`` threads ``engine`` into ``ctx`` so the denial hook can ``drain_denial_injections``. ``build_hook_matchers`` accepts a new ``user_id`` kwarg (§12.2) so approval rows carry caller identity for the REST side's ownership check. ``permissionDecisionReason`` guaranteed surface ----------------------------------------------- The hook's deny return is the ONLY guaranteed surface the SDK emits to the agent; denial injection is best-effort (pre-empted by cancel). ``_deny_response`` pipes every reason through ``_strip_ansi`` + ``_truncate(500)``: ANSI sequences can never reach the model, and the line stays loggable. §12.7 requirement. Tests: +24 agent hook tests (47 total in test_hooks.py). Run in 0.92s via a ``_fast_poll`` fixture that collapses ``asyncio.sleep`` to a no-op AND advances ``hooks.time.monotonic`` by the requested duration so the poll wall-clock deadline actually trips. Happy paths: - APPROVED + scope propagation to allowlist + milestones. - APPROVED with scope=this_call does NOT grow allowlist. - DENIED queues denial injection + populates recent-decision cache (next identical call auto-denies). - TIMED_OUT writes TIMED_OUT row and emits approval_timed_out. IMPL-24 race: four branches. - APPROVED re-read → allow flow, approval_late_win milestone, scope propagated, resume succeeds. - DENIED re-read → deny flow, approval_late_win milestone, user's reason is the permissionDecisionReason. - Still-PENDING re-read → fail-closed fall-through (no late_win). - Row-gone re-read → same fail-closed fall-through. Cap / rate-limit / write failure / resume failure branches all: - Short-circuit before any DDB write when the local guard fires (cap, rate limit). - Emit the right approval_* milestone. - Return DENY with a specific permissionDecisionReason. Sanitization: - ANSI stripped from deny reason. - Deny reason truncated to ≤500 chars. Timeout clipping: - rule_annotation reason when a rule's approval_timeout_s is below the task default; matching_rule_ids populated. - maxLifetime_ceiling reason when remaining lifetime is the tightest bound; matching_rule_ids is None. - approval_ceiling_shrinking emits exactly once per task (IMPL-26 latch). Denial injection hook (6 tests): - Draining produces a <user_denial request_id=... decided_at=...> block with XML-escaped reason. - Cancel short-circuit preserves the queue so the denial is not lost; just not injected into a dying agent. - Hostile reason (</user_denial>...<user_nudge>) is XML-escaped so the envelope cannot be forged. - No-engine ctx returns [] (Phase 1 call sites still work). - Registered LAST in ``between_turns_hooks`` (invariant for §6.5 finding aws-samples#2). - End-to-end via stop_hook: queued denial becomes ``decision=block`` + reason on the Stop return. Carry-forward ------------- - ``_remaining_maxlifetime_s`` returns None when TASK_STARTED_AT is unset — Chunk 4/5 will wire this at task launch. Tracked in §16. - ``approval_gate_count`` lives on the engine (session-scoped) not on TaskTable in v1. §13.6 notes that the reconciler + approval_gate_cap still bound worst-case across container restarts. Chunk 7+ tracks persistence when telemetry justifies it. - Denial injection emits a ``user_denial_injected`` milestone that is NOT in the §11.1 enumerated table. It mirrors ``nudge_acknowledged`` for stream visibility; keep the name distinct from the ``approval_*`` prefix so future §11.1 consumers can't confuse it with an approval outcome.

Lands the stateless CDK primitives for Cedar-HITL approval gates so Chunk 5's REST handlers can be wired onto concrete tables. Completes §15.2 tasks 9, 20, and 25. Constructs ---------- ``TaskApprovalsTable`` (§10.1) - PK ``task_id`` + SK ``request_id`` (ULID). Matches the agent-side primitives landed in the prior commit. - GSI ``user_id-status-index`` with user_id PK + status SK and an ``INCLUDE`` projection limited to the fields GET /v1/pending renders. Three deny-sensitive attrs (``deny_reason``, ``scope``, ``tool_input_sha256``) deliberately omitted from the projection — the list endpoint only returns PENDING rows in practice, but excluding them kills the projection-leak concern outright and costs no bytes today. - Exports ``USER_STATUS_INDEX_NAME`` as a module constant + mirrors it on ``construct.userStatusIndexName`` so handlers referencing the GSI fail compile-time on a rename. - TTL attribute ``ttl`` (agent writes ``created_at + timeout_s + 120s``). - No DynamoDB streams per §11.2. TaskEventsTable carries the audit fan-out; streams here would duplicate. - Default RemovalPolicy.DESTROY to match the rest of the sample. Production deploys override to RETAIN per §10.1. ``SlackUserMappingTable`` (§11.2, finding aws-samples#4) - Single-key (``slack_user_id`` PK). No SK, no TTL, no GSI, no stream. The forward-only shape is the trust boundary — a reverse GSI (Cognito → Slack) would let a compromised Cognito sub enumerate Slack identities without adding v1 capability. - Writes land through LinkSlackUserFn (Chunk 5) which enforces the ``attribute_not_exists(slack_user_id)`` condition so a prior legitimate mapping cannot be overwritten by a later compromise. ``task-status.ts`` — AWAITING_APPROVAL (§10.3) - Added to TaskStatus enum + ACTIVE_STATUSES (NOT TERMINAL_STATUSES: the task is alive, paused on a human decision). - VALID_TRANSITIONS wires the five edges §10.3 enumerates: RUNNING → AWAITING_APPROVAL (soft-deny entry) HYDRATING → AWAITING_APPROVAL (rare early-gate case) AWAITING_APPROVAL → RUNNING (approve / deny resume) AWAITING_APPROVAL → CANCELLED (user cancel mid-approval) AWAITING_APPROVAL → FAILED (stranded-approval reconciler) - Notably NOT added: AWAITING_APPROVAL → FINALIZING (approve-during-cleanup race) AWAITING_APPROVAL → COMPLETED (skip RUNNING) AWAITING_APPROVAL → TIMED_OUT (timer lives on the approval row, not the task clock) These are regression tests so a future refactor cannot quietly add them and bypass the `awaiting_approval_request_id = :rid` invariant. Tests: +29 total. - TaskApprovalsTable (11 tests): PK/SK schema, PAY_PER_REQUEST, PITR default + override, TTL attribute, NO streams, GSI schema + projection + sensitive-attr exclusion, removal policy default + override, ``USER_STATUS_INDEX_NAME`` constant parity with the construct field. - SlackUserMappingTable (8 tests): single-key schema (explicit KeySchema length assertion), PAY_PER_REQUEST, PITR, no streams, no reverse GSI, DESTROY default, TTL absent. - TaskStatus (+10 tests over existing: 5 new assertions on the 9-state cardinality, AWAITING_APPROVAL membership, and the transition graph including the three forbidden edges). The existing assertions updated for the new state count. No stack wiring yet — ``agent.ts`` instantiation + env var plumbing + grants land in the next commit alongside the Cedar-WASM Lambda layer.

…stack Activates the agent-side approval path and ships the Lambda layer Chunk 5's REST handlers need. Cedar-wasm Lambda layer (§15.2 task 10) ---------------------------------------- ``CedarWasmLayer`` bundles ``@cedar-policy/[email protected]`` into ``/opt/nodejs/node_modules/`` so Lambdas can ``require('@cedar-policy/cedar-wasm/nodejs')`` without shipping the 4 MB wasm binary in every function package. A dedicated ``cdk/layers/cedar-wasm/`` directory carries a minimal ``package.json`` pinning the exact version — bundling runs ``npm install --omit=dev`` against that manifest, so the layer build is hermetic from any ``cdk/node_modules/`` drift. The bundler has two fallbacks: - Docker (``public.ecr.aws/sam/build-nodejs22.x``) for CI / prod deploys. - Local-npm fallback for environments without Docker (unit-test synths + `cdk synth` on runners that lack Docker). The local path is safe here because the layer ships pure JS + a prebuilt wasm binary — no native build step. Three constants exposed from the module: - ``CEDAR_WASM_VERSION`` — single source of truth for the pinned version; tests assert this matches both ``cdk/package.json`` and the layer manifest, so the three places the version lives stay in sync. - ``CEDAR_WASM_MIN_LAMBDA_MEMORY_MB`` — 512 MB floor for attaching Lambdas per §15.2 task 10. - ``CedarWasmLayer.layer`` — the underlying ``LayerVersion`` for Chunk 5 handlers to attach via ``fn.addLayers(...)``. Agent stack wiring (§15.2 task 19) ------------------------------------ ``agent.ts`` now instantiates: - ``TaskApprovalsTable`` (prior commit) — grants RW to the runtime so ``pre_tool_use_hook`` can TransactWriteItems + ConsistentRead the PENDING row. - ``SlackUserMappingTable`` (prior commit) — not granted to the runtime; only the link-user Lambda (Chunk 5) writes here. - ``CedarWasmLayer`` — the layer's asset lands in the synthed template so Chunk 5 handlers can reference ``.layer`` without causing a new asset on their deploy. New runtime env vars: - ``TASK_APPROVALS_TABLE_NAME`` — consumed by ``task_state._require_tables``; its absence previously raised ``ApprovalTablesUnavailable`` → hook DENY. Now set, so the approval path is live on deploy. - ``AGENTCORE_MAX_LIFETIME_S = '28800'`` — 8 hours, matching ``lifecycleConfiguration.maxLifetime``. Consumed by the hook's ``_remaining_maxlifetime_s`` for the maxLifetime ceiling clip (§6.5). Kept in sync with the lifecycle via a direct test assertion so drift surfaces at build time. New CfnOutputs: ``TaskApprovalsTableName``, ``SlackUserMappingTableName``, ``CedarWasmLayerArn``. Each is useful for post-deploy smoke tests (`aws dynamodb describe-table` / `aws lambda get-layer-version`). Tests: +8 layer tests + 9 agent-stack assertions. Layer: - LayerVersion resource count. - Compatible runtimes (nodejs20/22). - Description carries the pinned version. - CEDAR_WASM_VERSION matches ``cdk/package.json``. - CEDAR_WASM_VERSION matches ``layers/cedar-wasm/package.json``. - CEDAR_WASM_MIN_LAMBDA_MEMORY_MB ≥ 512. - Custom description override works. - ``.layer`` exposes a real ``LayerVersion``. Agent stack: - Table count updated from 6 → 8. - TaskApprovalsTable schema match (task_id PK / request_id SK, user_id-status-index GSI presence). - SlackUserMappingTable single-key schema. - LayerVersion count + compatibleRuntimes. - Three new CfnOutputs present. - TASK_APPROVALS_TABLE_NAME env var on the runtime. - AGENTCORE_MAX_LIFETIME_S == '28800' (drift guard). Carry-forward ------------- - ``TASK_STARTED_AT`` is the other input the hook's ``_remaining_maxlifetime_s`` consumes — it's a PER-TASK value the orchestrator must stamp at invocation time, not a stack-level env var. Chunk 5's orchestrator changes need to add it to the runtime invocation payload / session env. For now the hook's fallback ("unknown, don't clip") keeps approvals functional. - Chunk 5 will attach the CedarWasmLayer onto ApproveTaskFn, DenyTaskFn, GetPoliciesFn, CreateTaskFn and assert ``memorySize >= CEDAR_WASM_MIN_LAMBDA_MEMORY_MB`` for each.

Lands the two user-facing REST handlers that flip a PENDING approval row to APPROVED / DENIED, the shared types both call sites and the CLI consume, and the Lambda-side Cedar parser future Chunk-5 handlers (get-policies, create-task validation) will use. Wire types (shared/types.ts) ---------------------------- - ApprovalScope union covering every shape the agent's ApprovalAllowlist understands. Typed so approve-task / create-task / CLI (Chunk 6) all agree at compile time. - ApprovalRecord / ApprovalRequest / ApprovalResponse / DenyRequest / DenyResponse / PendingApprovalSummary / GetPendingResponse / PolicyRuleSummary / GetPoliciesResponse / LinkSlackUserRequest / LinkSlackUserResponse / SlackUserMappingRecord / ApprovalDecisionRecordedEvent / CreateTaskApprovalExtensions. - Constants: DENY_REASON_MAX_LENGTH=2000, INITIAL_APPROVALS_MAX_ENTRIES=20, INITIAL_APPROVALS_MAX_ENTRY_LENGTH=128, APPROVAL_TIMEOUT_S_MIN=30, APPROVAL_TIMEOUT_S_MAX=3600, APPROVAL_TIMEOUT_S_DEFAULT=300. New error codes: REQUEST_NOT_FOUND, REQUEST_ALREADY_DECIDED, TASK_NOT_AWAITING_APPROVAL. Shared helpers -------------- - shared/approval-scope.ts — parseApprovalScope validates every shape; rejects unknown tool types / groups / prefixes, empty values, over-128-char strings. isDegeneratePattern implements §7.4 (length ≤ 2, all-wildcard, wildcard ratio > 50%) for Chunk-5 create-task. - shared/deny-reason-scanner.ts — scanDenyReason redacts AWS keys, GitHub PATs (classic + fine-grained), Slack tokens, PEM blocks, bearer tokens with [REDACTED-...] markers. Mirrors agent/src/output_scanner.py so the deny reason the agent ultimately reads is never raw user input. - shared/cedar-policy.ts — parseRules pulls the five HITL annotations (tier/rule_id/severity/approval_timeout_s/category) into a ParsedRule[], preserving positional policy_id for IMPL-29 diagnostics-to-rule_id resolution. isHardDenyRule, isValidRuleId, matchingRuleIds, concatPolicies exposed for future handlers. Handlers -------- - approve-task.ts (§7.1) — POST /v1/tasks/{task_id}/approve - Cross-table TransactWriteItems: approval row PENDING → APPROVED guarded by user_id = :caller AND status = :pending; TaskTable no-op Update guarded by status = AWAITING_APPROVAL AND awaiting_approval_request_id = :rid. - TransactionCanceledException classified by per-item CancellationReasons. Approval-row failure collapses to 404 REQUEST_NOT_FOUND (no existence oracle per §7.1 finding aws-samples#6); task-row failure → 409 TASK_NOT_AWAITING_APPROVAL. - Optional scope defaults to this_call. - Per-user per-minute rate limit (30/min, synthetic row). - Writes approval_decision_recorded audit event (IMPL-6). Audit failure is logged but does not fail the request — decision is already committed. - deny-task.ts (§7.2) — POST /v1/tasks/{task_id}/deny - Same cross-table pattern; status → DENIED + deny_reason. - Reason is scanDenyReason-sanitized + truncated to DENY_REASON_MAX_LENGTH BEFORE any persistence — agent and audit both read sanitized form; raw input never stored. - Same rate-limit namespace as approve. Tests: +64 total (cedar-policy-parser 24, approval-scope 28, deny-reason-scanner 13, approve-task 14, deny-task 9). Secret test fixtures are assembled from string fragments so the source never holds a contiguous secret literal — Code Defender pre-commit hook otherwise blocks. Stack wiring (task-api.ts routes, agent.ts layer attachment, CreateTaskFn extension, orchestrator + reconciler + fanout + LinkSlackUserFn + GetPolicies + GetPending) lands in the next commit.

Lands the three read/discovery handlers Chunk 6 (CLI) needs to power ``bgagent pending``, ``bgagent policies list/show``, and ``bgagent notifications configure slack``. Completes §15.2 tasks 14, 15, and 25 (handler side). Handlers -------- ``get-pending.ts`` (§7.7 — GET /v1/pending) - Queries ``user_id-status-index`` GSI on TaskApprovalsTable with ``user_id = :caller AND status = :pending``. Without the GSI this would be a full-table Scan per call — under ``watch -n1 bgagent pending`` that exhausts burst capacity for the whole fleet (§10.1 finding aws-samples#8). - Response maps each row to ``PendingApprovalSummary`` with a derived ``expires_at = created_at + timeout_s`` so the CLI can render time-to-timeout without doing arithmetic on ISO strings. - Severity coerced to ``medium`` on unknown values so GSI writes that drift from the enum don't break the list response. - Rate-limited 10/min/user (synthetic row on the same table, namespaced ``RATE#<user>#PENDING`` so it does not collide with the approve/deny counter). ``get-policies.ts`` (§7.6 — GET /v1/repos/{repo_id}/policies) - Combines ``BUILTIN_HARD_DENY_POLICIES`` + ``BUILTIN_SOFT_DENY_POLICIES`` with the repo's ``cedar_policies`` blueprint override. Runs the combined text through ``parseRules`` and returns ``{hard[], soft[]}`` rule summaries. - 5-minute per-repo in-Lambda cache; cold starts throw it away. ``_resetCacheForTests`` exposed for unit-test isolation. - Repo ID is URL-decoded from the path (``owner%2Frepo`` common in CLI UX). - Rate-limited 30/min/user. - Blueprint load failure falls back to built-ins with a WARN log; invalid blueprint cedar text returns 503 ``SERVICE_UNAVAILABLE`` rather than a misleading empty list. ``link-slack-user.ts`` (§11.2 finding aws-samples#4 — POST /v1/notifications/slack/link) - Writes to SlackUserMappingTable with ``ConditionExpression: attribute_not_exists(slack_user_id)``. This guard is the entire admission control the §11.2 design hinges on: even a compromised Slack admin cannot overwrite an existing mapping. - Validates ``slack_user_id`` shape (letters, digits, underscores, 2–40 chars) so junk rows cannot land. - Conflict surface is 409 ``REQUEST_ALREADY_DECIDED`` — reused error code (the payload message directs the user to unlink via support). - Slack link_token end-to-end validation against Slack OAuth is deferred — v1 accepts the token on trust from the Cognito-authed caller; it is persisted in CloudWatch for audit. Supporting primitives --------------------- ``shared/builtin-policies.ts`` — mirrors ``agent/policies/hard_deny.cedar`` and ``agent/policies/soft_deny.cedar`` as TypeScript string constants. Embedded rather than read from disk because Lambda's esbuild bundler does not copy non-TS assets by default and a dedicated bundling hook is more code than the embed. A drift test (``builtin-policies.test.ts``) asserts byte-equality with the agent files so any change on one side without the other flips red at build time. ``shared/cedar-policy.ts`` — ``parseRules`` now skips the unannotated ``base_permit`` entry (both tiers need it as a Cedar catch-all; it is not a user-facing rule so it stays out of ParsedRule[]). This matches the agent-side ``_parse_policy_annotations`` behaviour. Tests: +37 total. - get-pending (8): 401 on missing auth, 429 on rate limit, empty result, GSI query shape, row → PendingApprovalSummary with derived expires_at, severity fallback, missing timeout → expires_at falls back to created_at, 500 on DDB error. - get-policies (11): 401/400 validation, built-in rules listed on empty repo, URL-decoded repo path, custom blueprint rule lands in soft, per-repo cache across calls, 429 rate limit, 503 on invalid blueprint cedar, fallback on load failure, hard rules omit severity / approval_timeout_s, soft rules carry them. - link-slack-user (8): 401/400 validation, shape check, 201 on success, 409 on overwrite attempt, 500 on unknown DDB error, whitespace trim on slack_user_id, ConditionExpression verified. - builtin-policies (4): drift byte-equality with both agent files, parseRules round-trip for hard/soft rule IDs. - cedar-policy (updated): ``base_permit`` is skipped from ParsedRule[] rather than rejected. Stack wiring (task-api.ts routes, agent.ts layer attachment, CreateTaskFn extension, orchestrator + reconciler + fanout) lands in the next commit.

…gent plumbing Completes Chunk 5 end-to-end: the five new Lambdas are instantiated and wired onto the REST API, the orchestrator threads approval-related data through to the agent runtime, the stranded-task reconciler sweeps AWAITING_APPROVAL tasks, and the agent pipeline accepts the new per-task approval configuration. Stack wiring (agent.ts + task-api.ts) ------------------------------------- - TaskApi construct accepts `taskApprovalsTable`, `slackUserMappingTable`, `cedarWasmLayer` props. Approve/Deny/GetPending Lambdas are created when the approvals table is present; GetPolicies also requires the cedar-wasm layer + RepoTable. Slack-link Lambda attaches when the slack mapping table is provided. - New routes: POST /tasks/{task_id}/approve POST /tasks/{task_id}/deny GET /pending GET /repos/{repo_id}/policies POST /notifications/slack/link - GetPoliciesFn configures `memorySize: 512` (Cedar-wasm floor from §15.2 task 10) and externalizes `@cedar-policy/cedar-wasm` from the esbuild bundle so the layer provides the wasm binary at runtime. - CedarWasmLayer compatibleRuntimes extended to include nodejs24.x (the Lambda runtime) — the Node 20/22 list was the original §15.2 spec but the actual function uses Node 24. - agent.ts passes all three new constructs into TaskApi. Orchestrator (shared/orchestrator.ts) ------------------------------------- - `finalizeTask` now treats AWAITING_APPROVAL as a "task still alive" terminal-timeout source: on poll exhaustion the task transitions to TIMED_OUT with a distinct `approval_poll_timeout` reason + error message ("Orchestrator poll timeout exceeded while awaiting approval"). The stranded-approval reconciler is the secondary safety net (§13.6) for tasks the orchestrator already lost track of. - Invocation payload now carries three new fields: - `task_started_at` (ISO 8601 at HYDRATING → RUNNING time) — consumed by the agent hook's `_remaining_maxlifetime_s` so the §6.5 maxLifetime ceiling math uses the real task clock instead of the fail-open fallback. - `approval_timeout_s` (when the submit payload supplied it). - `initial_approvals` (when the submit payload supplied entries). Stranded-task reconciler ------------------------ - Sweeps AWAITING_APPROVAL in addition to SUBMITTED/HYDRATING. - New `APPROVAL_STRANDED_TIMEOUT_SECONDS` env var (default 7200s = 2h) — double §7.3's 1h ceiling so this reconciler never races the happy-path timer. - Distinct failure message on approval-stranded vs generic-stranded so users see "approval stranded — container evicted" rather than the misleading "no pipeline attached" copy. Fanout (handlers/fanout-task-events.ts) --------------------------------------- - Slack channel default set replaces the forward-compat `approval_required` stub with the real §11.1 events: `approval_requested` and `approval_stranded`. Other approval milestones (granted/denied/timed_out/late_win/etc.) stay out of default routing to avoid notification fatigue — the CLI surfaces those confirmations directly. - Email default replaces `approval_required` with `approval_requested` (high-severity gates only; severity gating happens in the dispatcher). Create-task validation (shared/create-task-core.ts) --------------------------------------------------- - New request fields: - `approval_timeout_s` — integer within `[APPROVAL_TIMEOUT_S_MIN, APPROVAL_TIMEOUT_S_MAX]`. - `initial_approvals` — array of scope strings; each entry must be a valid `ApprovalScope` per `parseApprovalScope`; bash_pattern and write_path scopes get the §7.4 degenerate-pattern check. - TaskRecord extended with `approval_timeout_s`, `initial_approvals`, `approval_gate_count` (seeded to 0 at admission), and `awaiting_approval_request_id` (written atomically by the agent's `transact_write_approval_request` primitive). Agent plumbing (models.py / config.py / pipeline.py / runner.py / server.py) ---------------------------------------------------------------------------- - `TaskConfig` adds `approval_timeout_s`, `initial_approvals`. - `build_config`, `run_task`, `_run_task_background`, and `_extract_invocation_params` thread the two new fields from payload → config → PolicyEngine. - `server._extract_invocation_params` stamps `os.environ["TASK_STARTED_AT"]` from the payload so the hook's `_remaining_maxlifetime_s` returns real values (carry-forward from Chunk 3 resolved). - `runner.py` constructs PolicyEngine with `initial_approvals` + `task_default_timeout_s` when supplied; the engine clamps bad values at construction time. Tests ----- All CDK tests pass: 1219 / 1219. All agent tests pass: 648 / 648. Affected suites (changes only): - test/stacks/agent.test.ts: cedar-wasm layer CompatibleRuntimes now expects `nodejs24.x`; table count still 8. - test/constructs/cedar-wasm-layer.test.ts: same runtime expansion. - test/handlers/fanout-task-events.test.ts: approval_required → approval_requested/approval_stranded in Slack default set; approval_required → approval_requested in Email default set. - test/handlers/reconcile-stranded-tasks.test.ts: primeResponses now queue a third `Items: []` for AWAITING_APPROVAL queries; queryCalls assertion bumped to 3. Carry-forward (non-blocking) ---------------------------- - GetPoliciesFn has write access to TaskApprovalsTable (for the rate-limit counter path). A future permissions audit should tighten this to a single-item write scoped to `RATE#<user>#*`. - TASK_STARTED_AT env var is only set when a payload supplies it; server.py still supports the Phase 2 no-payload startup path.

Ships the four user-facing commands that close the Cedar HITL loop: once Chunks 1-5 have a PENDING approval row and the Slack/Email fan-out has notified the user, Chunk 6 is how they actually respond. New commands (cli/src/commands/) -------------------------------- - `bgagent approve <task-id> <request-id> [--scope <scope>] [--yes]` Default scope is `this_call`; callers extend allowlist with `tool_type:Bash`, `rule:<id>`, etc. `all_session` is the only scope that requires `--yes` to confirm — mirrors the safety UX from §8.4. Error classification maps 404 → "run `bgagent pending`", 409 → "task no longer awaiting approval", 429 → rate-limit, 401 → login. - `bgagent deny <task-id> <request-id> [--reason ... | --reason-file ...]` `--reason-file` accepts multi-line reasons that would otherwise need shell quoting. Client-side `DENY_REASON_MAX_LENGTH` cap avoids a round-trip on obviously-too-long reasons; the server still truncates. Reason is sanitized server-side (output_scanner) before ever reaching the agent. - `bgagent pending [--output text|json]` Lists every PENDING approval owned by the caller. Rendered with approve/deny hints inline so the user can copy-paste the next command. JSON output for scripting. Rate-limited server-side. - `bgagent policies list --repo <owner/repo> [--tier hard|soft]` `bgagent policies show --repo <owner/repo> --rule <rule_id>` Discovery commands so users can find rule IDs without reading CDK source. Both subcommands reuse a single `listPolicies` API call and filter locally. Wire changes ------------ - `cli/src/api-client.ts`: `approveTask`, `denyTask`, `listPending`, `listPolicies` — each matching the §7.1 / §7.2 / §7.6 / §7.7 request/response shapes. `approveTask` omits the `scope` body field when unset so the server's `this_call` default applies. - `cli/src/types.ts`: mirrors the Chunk 5 server types verbatim — `ApprovalScope` union, `ApprovalRequest/Response`, `DenyRequest/Response`, `PendingApprovalSummary`, `GetPoliciesResponse`, `PolicyRuleSummary`, plus the five constants (`DENY_REASON_MAX_LENGTH`, `INITIAL_APPROVALS_MAX_ENTRIES`, `INITIAL_APPROVALS_MAX_ENTRY_LENGTH`, `APPROVAL_TIMEOUT_S_MIN/MAX/DEFAULT`). - `cli/src/bin/bgagent.ts`: registers the four new commands in the order they appear in help output. Tests: +27 new (217 total). - approve (9): default scope, custom scope, all_session guard + `--yes` bypass, JSON output, 404/409/401/429 error classifications. - deny (6): no-reason path, `--reason`, `--reason-file` with tmpdir fixture, mutually-exclusive rejection, over-length rejection, 404 classification. - pending (5): empty render, populated render with approve/deny hints, JSON output, 401 and 429 classifications. - policies (7): list both tiers, `--tier` filter, `--output json`, bad `--tier`, show found rule, show unknown rule, 404 repo-not-onboarded classification. Carry-forward ------------- - `submit.ts` extension with `--approval-timeout` / `--pre-approve` flags is deferred to a follow-up commit — the server already accepts these fields on POST /v1/tasks (Chunk 5), and `bgagent submit` already forwards unknown payload fields through the existing request path, so users can set them via `--body-file` today until the explicit flags land. - `--output json` on error branches currently returns a CliError instead of a JSON error envelope; matches the pattern the existing commands use (status, cancel, nudge). Follow-up to standardize JSON error envelopes across the whole CLI if that becomes a common scripting pain point.

…ervability Persist approval_gate_count to TaskTable across container restarts per §13.6 so the cumulative gate budget survives eviction. Emit pre_approvals_loaded after PolicyEngine init per §4 step 7 / §11.1 so operators see the starting approval posture in the live SSE stream. Add IMPL-23 cache-hit observability: cache hits attach metadata to PolicyDecision, hook forwards to new write_policy_decision_cached progress helper (decision_source="recent_decision_cache"). Why: container restarts were silently resetting the per-task gate counter, re-exposing users to another approvalGateCap-worth of gates per restart. Cache-driven denies were invisible in TaskEventsTable beyond the initial gate. Fresh tasks emitted no "starting posture" signal so dashboards could not distinguish "no pre-approvals seeded" from "agent has not started". Surface additions: - task_state.increment_approval_gate_count_in_ddb — best-effort atomic ADD on approval_gate_count - PolicyEngine(initial_approval_gate_count=N) — seed session counter - TaskConfig.initial_approval_gate_count — orchestrator payload field - progress_writer.write_policy_decision_cached — IMPL-23 emitter - PolicyDecision.cache_hit_metadata — observability-only field - _CachedDecision.original_decision_ts — wall-clock preservation - runner._initialize_policy_engine_and_hooks — extracted helper Counter survival is a safety bound, not correctness: DDB failure does NOT block the gate (§13.6). Joint-update invariant on status + awaiting_approval_request_id (§10.2) is preserved — counter uses separate UpdateItem, not merged into resume transaction. Tests: +36 agent (648→684), +8 CDK (1219→1227), +6 new runner tests.

Capture the per-task approval-gate cap at submit-time (§4 step 5, decision aws-samples#13, §13.6) so a blueprint-configured override is frozen onto the TaskRecord. Mid-task blueprint edits cannot shift the cap beneath a running task; container restarts re-seed the agent's PolicyEngine from the persisted value instead of its compile-time default-50. Why: Chunk 7a added approval_gate_count persistence but the cap itself was still resolved from the blueprint on every restart — so an operator lowering security.approvalGateCap mid-task would retroactively fail-close the running task. The design has always said cap is frozen at submit; this chunk makes the implementation match. Surface additions: - BlueprintProps.security.approvalGateCap (CDK, synth-validated [1, 500] integer) — new per-repo blueprint prop - RepoConfig.approval_gate_cap + BlueprintConfig.approval_gate_cap - TaskRecord.approval_gate_cap + APPROVAL_GATE_CAP_{MIN,MAX,DEFAULT} - create-task-core now calls loadRepoConfig, resolves cap, bounds- checks, persists; returns 503 SERVICE_UNAVAILABLE on invalid blueprint data (permanent until admin re-deploys, not transient) - orchestrator.ts: isValidApprovalGateCap integer+bounds guard; logs warn if a persisted cap is structurally invalid (schema drift / hand-edited DDB row) - TaskConfig.approval_gate_cap: int | None = None (agent-side); runner threads to PolicyEngine kwarg when not None - "Task created" log line now carries approval_gate_cap + approval_gate_cap_source ("blueprint" | "platform_default") so operators can detect a broken-plumbing deploy at the single chokepoint where all fallback layers converge Per silent-failure review: - HIGH: 500 → 503 + logger.error for permanent misconfig - HIGH: cap + source in task-created log (catches 4-layer cascade) - MEDIUM: orchestrator guard tightened past typeof (NaN, Infinity, floats, out-of-bounds all omitted + warned) Tests: CDK 1263/1263 (+36), agent 694/694 (+10). CLI unchanged.

… warn path Close three deferred items from Chunks 7a/7b before Chunks 8-10: - runner.py init log now carries approval_gate_cap=N + approval_gate_cap_source=threaded|engine_default. Matches the handler log key so CloudWatch Insights can join across the cascade; agent can't distinguish blueprint-override from platform-default-frozen (handler log is the ground truth). - server.py adds _warn_cw helper routing [server/warn] lines to a dedicated CloudWatch stream (server_warn/<task_id>). stdout print is preserved for local dev + existing capsys tests. AgentCore does not forward container stdout to APPLICATION_LOGS, so pre-7c warnings about malformed invocation payloads were invisible in production. Failure counter shared with _debug_cw for a single alarm surface; hoisted above writer defs for import-time ordering safety. - blueprint.ts emits a synth-time info annotation when security.approvalGateCap is omitted so operators see a signal that the repo will rely on the platform default of 50. Without this, the default was a silent fallback at the handler layer — only visible by inspecting a TaskRecord at runtime. Tests: agent 694→700 (+6), cdk 1263→1265 (+2), cli unchanged. Design refs: §4 step 5, §11.1, §13.6, decision aws-samples#13.

Add created_at / effective_timeout_s / matching_rule_ids to approval_granted / approval_denied / approval_timed_out events so the incoming ApprovalMetricsPublisher Lambda (Chunk 8b) can compute decision latency and emit a rule_id-dimensioned timeout breakdown without a round-trip GetItem against TaskEventsTable. Fields are added conditionally — omitted from metadata when the caller did not supply them — so the event stream stays free of null-value noise and legacy callers continue to produce valid payloads. Publisher handles missing fields via explicit skip-and-log on the specific metric branch (not fallback-to-zero). Agent tests extended: +6 progress_writer tests, +3 hooks tests. Baseline 700 → 710. No consumer wired yet — this commit is a forward-compatible superset; Chunk 8b ships the CDK publisher + dashboard widgets.

…atch dashboard widgets Ship the Cedar-HITL dashboard widgets from §11.3 / IMPL-28 via the MetricsPublisher architecture (Option E): - New ApprovalMetricsPublisher Lambda consumes TaskEventsTable DDB stream as consumer aws-samples#2 (FanoutConsumer is aws-samples#1; stream is within its 2-consumer soft cap — documented in task-events-table.ts). - Handler emits CloudWatch EMF for 3 metrics in namespace ABCA/Cedar-HITL: * ApprovalRequestCount + ClippedApprovalCount (reason dim) → ApprovalTimeoutClipRate widget (MathExpression with IF-guard against NaN on zero-denominator periods) * TimedOutEffectiveTimeout (rule_id dim with allowlist cardinality cap) → ApprovalTimeoutBreakdown widget * ApprovalDecisionLatencyMs (outcome dim) → ApprovalDecisionLatency widget with per-outcome p50/p90/p99 - Observability-of-observability (silent-failure review): * MetricsPublisherHeartbeat per batch so dashboard gaps distinguish "no traffic" from "pipeline broken" * MetricEmitSkipped with a reason dim on schema mismatches, parse anomalies, unknown rule ids — never fall back to latency=0 or count=0 which would poison percentile widgets * Expected high-volume skip reasons (non-milestone events, REMOVE records) DO NOT emit MetricEmitSkipped — only anomaly reasons (missing keys, missing milestone name) do, so real signal isn't drowned * Structured log lines alongside every skip so the absence of metrics is also observable via CloudWatch Logs Insights - Cardinality caps via ``RULE_ID_ALLOWLIST`` + ``normalizeClipReason``. Unknown values collapse to ``other`` / ``unknown`` buckets with dashboard series so the collapse is discoverable rather than silently accruing custom-metric cost. - Event-source-mapping filter pattern rejects non-agent_milestone records at the service layer; handler-layer allowlist catches anything that slips through. Filter pattern correctness tested structurally + positively/negatively probed (silent-failure H3). - Per-record try/catch + reportBatchItemFailures + SQS DLQ mirror the fanout-task-events.ts poison-pill pattern exactly. Deferred to Chunk 10 chore issues: - DLQ alarms (fanout + publisher) — fire-into-void until notification channel lands, so wire with §11.5 alarms as a group - Explicit log-group declaration (IAM drift defense) - stdout-flush race documentation (pre-existing pattern in fanout) - EMF 100-updates/sec throttle alarm Tests: cdk 1265 → 1327 (+62); agent 710 (unchanged); cli 217 (unchanged). All pass. §11.5 alarm plumbing now unblocked — publisher provides the metrics infrastructure the design always intended; only the notification-channel SNS wiring is left.

Bring CEDAR_HITL_GATES.md current with the code that shipped in Chunks 7b (approval_gate_cap persist), 8a (outcome event schema superset), and 8b (ApprovalMetricsPublisher + dashboard widgets): - §10.2 adds the missing approval_gate_cap row (carry-forward drift from Chunk 7b). Bounds + frozen-at-submit semantics documented. - §11.1 outcome events (approval_granted / approval_denied / approval_timed_out) now document the Chunk 8a optional fields (created_at, effective_timeout_s, matching_rule_ids) plus the publisher's skip-on-missing-field policy. - §11.1 intro names ApprovalMetricsPublisherFn as consumer aws-samples#2 and points to §11.3 for the metric schema. - §11.3 rewritten to describe the Option E architecture: publisher Lambda + EMF + native CloudWatch metrics in namespace ABCA/Cedar-HITL, MathExpression with divide-by-zero guard, rule_id cardinality cap, observability-of-observability via heartbeat + skip meta-metrics, widget layout (12/12 over 24), 2-consumer stream budget. Dropped the stale "Retired the old bundled widget" line — that widget never shipped. - §11.5 reframed as "deferred (notification-channel gated)" with a plumbing-status paragraph noting the metric infra now exists; only SNS wiring remains. Alarm list expanded to include DLQ and publisher-health alarms. - §16 IMPL-28 rewritten for Option E; §15.2 row 46 expanded to reference the 4 new test files; Appendix B checklist updated. Starlight mirror regenerated via ``cd docs && node scripts/sync-starlight.mjs``. No code changes. Test baselines unchanged. Adversarial comment-analyzer review verified every new claim against committed code — zero inaccuracies.

…2 mediums Full-branch adversarial review (code-reviewer + silent-failure-hunter on all 18 commits) surfaced findings that only appear at final-state. Addressing the blockers + low-cost meds before deploy: B2 — stranded approvals were invisible to the dashboard: - Reconciler writes ``event_type: 'task_stranded'``; the metrics publisher's event-source filter only accepts ``event_type: 'agent_milestone'``, so AWAITING_APPROVAL evictions produced zero §11.3 signal. - Fix: reconciler now additionally emits an ``agent_milestone`` with ``milestone: 'approval_stranded'`` when the stranded task was AWAITING_APPROVAL. Publisher allowlist extended; classifier emits ``ApprovalStrandedCount`` counter. SUBMITTED / HYDRATING stranded events unchanged (guarded by test). B1 — heartbeat comment was false reassurance: - Event-source filter blocks Lambda invocation when no ``agent_milestone`` records exist in the poll window, so a quiet period produces the same widget gap as a broken pipeline. The code + design-doc wording claimed "gap = pipeline broken" which would mislead the on-call. - Fix: corrected module + function docstrings to describe the heartbeat as "present when active, not pipeline-alive-always." Operators should alarm on the combination (heartbeat-absent + recent TaskEventsTable traffic) or wire a scheduled canary — the latter tracked as a §11.5 follow-up. M1 — safety-critical milestones produced zero dashboard signal: - ``approval_cap_exceeded`` (§12.9 per-task cap) and ``approval_rate_limit_exceeded`` (per-user per-minute rate) were emitted by the agent but not on the publisher allowlist. A production bug where every gate hit the cap would have been invisible. - Fix: both added to APPROVAL_METRIC_MILESTONES with ``ApprovalCapExceededCount`` / ``ApprovalRateLimitExceededCount`` counters. No dimensions — the request_id in the event carries per-user correlation for ad-hoc log-insights investigation. H2 — filter / handler eventName disagreement: - Event-source filter required ``INSERT``; handler accepted ``INSERT`` and ``MODIFY``. Benign today (TaskEventsTable is put-only), but a future chunk MODIFY-ing records would be silently dropped by the filter while the handler was ready to process them. - Fix: handler now INSERT-only, matching the filter. Single source of truth on the eventName invariant. M1-rename — ``expected_non_approval_milestone`` skip reason was misleading (the non-metric approval milestones like ``approval_late_win`` also land in this bucket). Renamed to ``expected_milestone_not_tracked``. Tests: cdk 1327 → 1332 (+5: 3 classifier branches for new metrics, 1 reconciler AWAITING_APPROVAL path, 1 SUBMITTED-not-double-counted guard). Agent + CLI unchanged. All pre-commit hooks green; pre-push security fails only on the 3 pre-existing CVEs tracked for chore issue filing. Deferred findings from the same review (file as chore issues): - H1: agent-dies-between-TIMED_OUT-and-resume loses latency (edge, affects p99 bias) - H3: late-win APPROVED created_at staleness invariant (works today, document invariant) - H4: _warn_cw daemon-thread burst under adversarial payload - M2-M4: late-win metric, rename helpers, etc. No upstream PR filing this chunk — deploy to Sam's AWS account for integration testing first.

…overflow policies Synth + deploy were blocked by cdk-nag: the Cedar HITL additions (TaskApprovalsTable grant + SlackUserMappingTable + extra env vars threaded to the AgentCore runtime) pushed the runtime ExecutionRole past CDK's inline-policy size limit, so CDK auto-splits excess statements into ``OverflowPolicy1``. The overflow inherits the same wildcard ``bedrock:InvokeModel*`` / CloudWatch actions as the base policy but lives at a path (``Runtime/ExecutionRole/OverflowPolicy1/Resource``) that the existing ``addResourceSuppressions(runtime, ..., applyToChildren: true)`` cannot reach — CDK creates overflow policies lazily during synth ``prepare()``, after the construct tree has been frozen and after static suppressions have been cached. Suppress via an Aspect at MUTATING priority so the suppression is applied before cdk-nag's READONLY visitor runs. Matches any path containing ``/Runtime/ExecutionRole/OverflowPolicy`` + ending ``/Resource`` so future ``OverflowPolicy2``, etc. are covered without hardcoding indices. Verified: ``mise //cdk:synth`` now completes cleanly. ``mise //cdk:test`` still 1332/1332.

…gate + CLI error visibility Three E2E T1.4 + T2.2 findings from the Chunk 10 integration-test session. Batched into one commit since all three need the same redeploy to verify: 1. agent/Dockerfile: COPY policies/ into the container image. ``PolicyEngine.__init__`` reads ``/app/policies/hard_deny.cedar`` + ``soft_deny.cedar`` at import time via ``_POLICIES_DIR = Path(__file__).parent.parent / "policies"``. The Dockerfile only copied ``src/``, so the directory was missing and every Cedar-HITL task failed at 0 turns with ``missing built-in hard-deny policies``. Introduced alongside Chunk 2 when the policy files were first added — Dockerfile was never updated. Zero tasks on this branch ever succeeded in deployed form until now; unit + Jest tests never caught it because they don't exercise the container layout. 2. cdk/src/handlers/get-policies.ts: add checkRepoOnboarded gate. Previously the handler was lenient (loaded RepoConfig best- effort, fell through to built-ins on miss), producing 200 with the full built-in set for any arbitrary repo string. Users typo-ing a repo name mistook the response for proof the repo was onboarded. Now consistent with POST /tasks — 422 REPO_NOT_ONBOARDED for any repo without an active RepoConfig row. Gate runs AFTER the rate-limit so the 429 doesn't leak onboarding-status via a 422-vs-200 timing oracle (+1 test covering this). +2 tests total. 3. cli/src/format.ts + cli/src/commands/watch.ts: describeReason + formatTerminalMessage now surface the raw error_message when the classifier's catch-all UNKNOWN fires. Previously they always preferred error_classification.{category, title}, turning the concrete string "missing built-in hard-deny policies: /app/policies/ hard_deny.cedar" into the useless "unknown: Unexpected error" label on the user's terminal. For KNOWN classifications (e.g. guardrail: PR context blocked) the new behavior appends the first line of error_message as concrete evidence — so even when the category is known, users see the actual diagnostic inline. +3 test cases covering the UNKNOWN fall-through and the KNOWN-with-detail path; adjusted 2 existing assertions. Tests: cdk 1332 → 1334 (+2); cli 217 → 220 (+3); agent 710 (unchanged — no agent code touched). All pre-commit hooks pass. Pre-push fails only on the 3 pre-existing CVEs carried from main. Not yet fixed (tracked in .e2e-test-plan.md "Surprises" section for later chore-issue filing): - telemetry.py::_METRICS_REDACT_KEYS scrubs error strings too aggressively — dashboard METRICS_REPORT events show "[redacted]" for every error including ones with zero secret-leak risk. Should run output_scanner.py pattern match instead of blanket substitution. TaskRecord error_message (which the CLI reads) is unaffected; only the dashboard widget suffers. - Container stdout goes to /aws/bedrock-agentcore/runtimes/<rt>-DEFAULT log group, not APPLICATION_LOGS. Dashboard LogQueryWidgets can't see agent-fatal ERROR lines. Fix is either a dashboard widget pointing at the runtime log group OR _warn_cw calls on fatal paths.

…roval-timeout/--pre-approve Five fixes batched from the 2026-05-11/12 E2E validation pass — all found by manual integration testing, none were caught by unit tests because the defects were at plumbing seams between correctly-tested components. 1. agent/src/runner.py dropped ``config.user_id`` when wiring hook matchers, so every approval-gate row landed ``user_id=""`` on the ``user_id-status-index`` GSI key. DDB rejected every ``TransactWriteItems`` with ValidationException and the PreToolUse hook fell through to ``_deny_response``. Symptom: agent "completed" force-push tasks with zero visible gating. 2. ``toTaskDetail`` dropped ``approval_gate_count``, ``approval_gate_cap``, and ``awaiting_approval_request_id`` from every task API response — the fields were populated on the TaskRecord but never serialized. ``bgagent status`` couldn't report gate posture. 3. ``GET /pending`` dropped ``matching_rule_ids`` (DB had it, handler didn't map it, type omitted it). Users couldn't see WHICH rule fired on a gate. 4. ``TaskApprovalsTable`` GSI projection didn't include ``matching_rule_ids``. Fixed by destructive recreate under a new construct id (``TaskApprovalsTableV2``) since DDB rejects in-place nonKeyAttributes updates. Design doc §10.1 now carries a "projection is fixed at design time" note and the construct test locks the list via ``Match.arrayWith``. 5. ``bgagent submit`` lacked ``--approval-timeout`` and ``--pre-approve`` flags (server accepted them, CLI didn't expose them). Blocked Phase 5/6 E2E tests on raw curl. Flags are repeatable (``--pre-approve``) and client-side- validated against server constants (APPROVAL_TIMEOUT_S_MIN/MAX, INITIAL_APPROVALS_MAX_ENTRIES). Test deltas: agent 710→711, CLI 220→232 (+12), CDK 1334→1336 (+2). All seven E2E phases re-run post-redeploy; all green except the two dashboard-visual checks that need Sam's eyes.

Batched here because all seven touch the same chunk-3/5/7/8/10 surface area and none of them needed to ship as their own deploy. The set: B1 — Strengthen the agent's response to a user DENY. Phase 4 observed the agent treating "User denied" as "try a different approach" and burning through max_turns on trivial variations of the same rule. Two fixes: * Wrap the reason injected into the model with AUTHORITATIVE-prefixed stop-language that names the matching rule(s) and tells the agent subsequent attempts will fast-deny. (agent/src/hooks.py) * Extend ``RecentDecisionCache`` with a rule-level cache keyed by ``(tool_name, rule_id)`` so semantic retries hit the cache even when the input hash differs. Populated only on DENIED (TIMED_OUT stays input-hash scoped because it's ambiguous user-intent). (agent/src/policy.py) B2 — Replace blanket ``[redacted]`` substitution on METRICS_REPORT ``error`` fields with ``output_scanner.scan_tool_output``-based pattern matching. Structural errors like "missing built-in hard-deny policies: /app/policies/hard_deny.cedar" now survive to the dashboard Recent-Events widget; real secret patterns (AWS keys, Bearer tokens, connection strings) still get ``[REDACTED-<LABEL>]``. (agent/src/telemetry.py) B3 — Mirror fatal agent ERROR lines to the APPLICATION_LOGS CloudWatch group via a new ``log_error_cw`` helper. Previously ``log("ERROR",…)`` wrote only to container stdout, which AgentCore routes to runtime-DEFAULT, so agent-fatal errors were invisible on both the TaskDashboard widgets and ``bgagent status``. Swap the three fatal call sites in pipeline.py + runner.py. (agent/src/shell.py) B4 — Document the get-policies 422 REPO_NOT_ONBOARDED gate (landed 2026-05-11 in fb69894) in design-doc §7.6, including the "gate runs AFTER rate-limit to avoid timing oracle" note. B11 — Declare explicit LogGroups for ApprovalMetricsPublisherFn and FanOutFn so Lambda-created-at-first-invoke log groups don't have an implicit grant graph and unbounded retention. C5 — Call out the Cognito pool constraints in USER_GUIDE.md + QUICK_START.md: username MUST be an email, password policy is min 12 chars with all four character classes, ``email_verified=true`` is required at create time, and ``--message-action SUPPRESS`` stops Cognito from attempting an SES email on new-user creation. C10 — Collapse the two RepoTable GetItems on the task submit path (``checkRepoOnboarded`` + ``loadRepoConfig``) into a single ``lookupRepo`` call that returns both verdicts. Test additions: regression guard asserts exactly one GetItem fires on submit. Test deltas: agent: 721 → 730 (+9 from rule-cache + telemetry-redaction tests) cli: 232 → 232 (no surface change) cdk: 1336 → 1337 (+1 from single-GetItem regression guard) Docs-sync mirror regenerated.

Resolves 13 conflicts across agent, cli, and cdk. Notable decisions: - agent/pyproject.toml: took upstream's 7 dep bumps (boto3, claude-agent-sdk, requests, fastapi, uvicorn, aws-opentelemetry-distro, mcp) but held cedarpy==4.8.0 exact to preserve the @cedar-policy/[email protected] parity contract documented in mise.toml. - agent/src/pipeline.py, runner.py, server.py: additive merges — kept both approval_* param set (ours, Cedar HITL) and channel_* param set (upstream, Slack/Linear). Preserved the log_error_cw wiring that mirrors fatal ERRORs to APPLICATION_LOGS. - cli/src/api-client.ts, bin/bgagent.ts: kept both import sets — GetPendingResponse/GetPoliciesResponse + approve/deny/pending/policies subcommands (ours) plus LinearLinkResponse + slack/linear subcommands (upstream). - cdk/src/stacks/agent.ts: merged aws-cdk-lib imports (AspectPriority + Aspects + Fn) and construct imports (SlackIntegration + SlackUserMappingTable). Resolved duplicate SlackUserMappingTableName CfnOutput by adopting upstream's SlackIntegration construct. - cdk/src/handlers/shared/create-task-core.ts: merged type imports — kept our 7 HITL constants (APPROVAL_GATE_CAP_{MIN,MAX,DEFAULT}, APPROVAL_TIMEOUT_S_{MIN,MAX,DEFAULT}, INITIAL_APPROVALS_MAX_ENTRIES) and upstream's ChannelSource + DEFAULT_MAX_TURNS. Surgical retirement: deleted HEAD-side link-slack-user.ts + associated SlackUserMappingTable (PK slack_user_id → cognito_sub) in favor of upstream's richer OAuth-linking flow (composite slack_identity PK + PlatformUserIndex GSI, 2-step Cognito-authed link via slash command + CLI). HEAD's mapping table had no reader anywhere in the codebase — nothing functional lost. Slack-button approval UX (Cedar HITL approve/ deny action_ids) deferred to a follow-up issue; it extends upstream's slack-interactions.ts cleanly. CVE cleanup: pinned astro==6.1.10 exact in docs/package.json to close GHSA-xr5h-phrj-8vxv. Exact pin avoids the transitive CVE churn at 6.3.x (fast-xml-parser, yaml). Merge also cleared the previously- residual urllib3 and basic-ftp CVEs tracked in project_pending_cve_followups.md. Test baselines: - agent: 756/757 passing (1 local-env-only failure blocked by Amazon git-defender; unrelated to this change) - cli: 238/238 passing across 22 suites - cdk: 1418/1418 passing across 83 suites - All pre-commit + pre-push hooks green; security:deps reports zero CVEs

@Severity

- USER_GUIDE: new §Approval gates (Cedar HITL) section covering pending/approve/deny/policies commands, scope reference, and pre-approval flow. Extends submit options with --approval-timeout and --pre-approve. Adds AWAITING_APPROVAL to the lifecycle state machine + statuses table. Adds three new approval events (approval_requested, approval_recorded, approval_timed_out). - QUICK_START: new §Step 7 end-to-end walkthrough (policies list → submit → watch → pending → approve/deny with reason injection → pre-approval variant for unattended runs). - CEDAR_POLICY_GUIDE (new): blueprint-author reference for writing @tier("hard")/@tier("soft") rules. Covers vocabulary (execute_bash / write_file / context.command / context.file_path), annotation reference (@rule_id, @Severity, @approval_timeout_s, @category), 4 worked patterns (rm -rf /, force-push main, env files, migrations), multi-match behavior (min timeout, max severity), task-start validation errors, capacity budgets (approvalGateCap, maxPreApprovalScope), and cross-engine parity testing via contracts/cedar-parity/ fixtures. - DEVELOPER_GUIDE: new §Writing Cedar policies for the repo subsection pointing to the new guide from §Repository preparation. - sync-starlight + astro sidebar: wire CEDAR_POLICY_GUIDE into the /customizing/cedar-policies route.

…+ resource tagging Second upstream merge to stay current before PR review. Brings in 3 commits since the previous merge at f36d352: - ff79c24 chore(deps): bump astro from 6.1.6 to 6.1.10 Upstream caught up to the exact pin we already have; kept our astro: 6.1.10 exact pin (not upstream's ^6.1.10) to block the lockfile from drifting to 6.3.2 and re-introducing transitive CVEs (fast-xml-parser, yaml). security:deps: zero CVEs. - a59ca35 feat(cdk): add github:* resource tagging strategy Auto-merged; only surfaced in our stack via a new ArnFormat import in agent.ts (resolved inline). - 9592796 feat(fanout): migrate SlackNotifyFn to FanOutConsumer subscriber (aws-samples#64). Significant upstream refactor (11 files, ~2000 lines) moving Slack outbound delivery from a standalone DDB Streams consumer onto FanOutConsumer. Drops TaskEventsTable from 2 concurrent stream readers to 1, restoring headroom for future channels. Our Cedar HITL surface is unaffected. Conflicts resolved (5): - docs/package.json: kept our exact astro pin (see above). - yarn.lock: regenerated clean via yarn install. - cdk/src/stacks/agent.ts: merged aws-cdk-lib import — kept ours (AspectPriority, Aspects for cdk-nag overflow suppression) and added upstream's ArnFormat (new resource-tagging strategy). - cdk/src/handlers/fanout-task-events.ts: merged the CHANNEL_DEFAULTS comment block — kept upstream's richer rationale for task_created/session_started inclusion and pr_created exclusion, preserved the Cedar HITL approval-gate phrasing. - cdk/test/handlers/fanout-task-events.test.ts: two drift-guard tests updated to reflect the merged CHANNEL_DEFAULTS.slack contents (our approval_requested/approval_stranded + upstream's session_started). The forwardCompat set (events in the filter that don't yet have a Slack renderer) now lists both approval events plus status_response — Slack-button approval renderers are a deferred follow-up. Tests: cdk 1526/1526 across 85 suites. Full pre-commit + pre-push hooks green; security:deps reports zero CVEs.

krokoko · 2026-05-15T14:41:43Z

Thanks ! Automated review:

Critical Issues (3 found)

C1. Missing fail-closed guard on _pre hook wrapper

Files: agent/src/hooks.py (the _pre function in build_hook_matchers)
Found by: Silent Failure Hunter

The _pre wrapper around pre_tool_use_hook has no try/catch. If _handle_require_approval raises an uncaught exception (e.g., asyncio.CancelledError, unexpected TypeError), the
exception propagates to the Claude Agent SDK's hook dispatch. If the SDK treats a hook crash as "allow the tool call," this is a fail-open security violation. Notably, the _post and
_stop wrappers do have try/catch blocks, making this omission conspicuous and likely an oversight.

Fix: Add a fail-closed catch to _pre that returns permissionDecision: "deny" on any unhandled exception.

C2. CreateTaskRequest type drift — missing attachments field in CLI

Files: cdk/src/handlers/shared/types.ts vs cli/src/types.ts
Found by: Code Reviewer

The CDK CreateTaskRequest includes attachments?: Attachment[] and defines the Attachment interface, but the CLI mirror has neither. This violates the explicit AGENTS.md contract:
"Shared API request/response shapes must stay in sync."

Fix: Add Attachment interface and attachments field to cli/src/types.ts.

C3. prompt_version field missing from CLI TaskDetail

Files: cdk/src/handlers/shared/types.ts vs cli/src/types.ts
Found by: Code Reviewer

CDK's TaskDetail has readonly prompt_version: string | null; but the CLI mirror omits it. Another AGENTS.md sync violation.

Fix: Add readonly prompt_version: string | null; to CLI's TaskDetail.

Important Issues (5 found)

I1. wrote = True on TIMED_OUT exception ignores user's approval

File: agent/src/hooks.py ~line 417-428
Found by: Silent Failure Hunter

When the TIMED_OUT status write raises an exception, the code sets wrote = True, which skips the IMPL-24 re-read path. If the user's APPROVED decision landed on the row during a
transient network error, the agent falsely denies the tool call. This contradicts the purpose of the IMPL-24 re-read mechanism.

Fix: Change to wrote = False on exception so the re-read path checks for a late approval.

I2. write_terminal precondition excludes AWAITING_APPROVAL

File: agent/src/task_state.py, write_terminal function
Found by: Code Reviewer

The ConditionExpression only accepts RUNNING, HYDRATING, and FINALIZING as prior statuses. If the container crashes while in AWAITING_APPROVAL, the shutdown handler cannot transition
the task — it hangs for up to 2 hours until the stranded-task reconciler catches it.

Fix: Add AWAITING_APPROVAL to the condition expression.

I3. task_state.py uses print() instead of structured logging

File: agent/src/task_state.py (8+ locations)
Found by: Silent Failure Hunter

All error logging uses bare print() instead of shell.log(). These messages don't route to CloudWatch Log Insights, can't trigger alarms, and have no structured fields for correlation.
DDB write failures for approval state transitions are invisible to all monitoring systems.

Fix: Migrate to from shell import log with structured fields.

I4. Scope validation failures not logged in approve-task.ts

File: cdk/src/handlers/approve-task.ts ~line 99-107
Found by: Silent Failure Hunter

A failed parseApprovalScope returns 400 but is not logged. This is security-relevant — malformed scopes could indicate probing, CLI bugs, or version mismatches.

Fix: Add logger.warn with task_id, raw_scope, error, and user_id before the 400 return.

I5. Agent-side DDB transaction functions lack unit tests

File: agent/tests/test_task_state.py
Found by: Test Analyzer (criticality 9/10)

transact_write_approval_request, transact_resume_from_approval, and best_effort_update_approval_status are tested only via the _FakeTaskState double in hook tests. The real DDB
condition expressions (the actual fail-closed enforcement layer) are never exercised against mocked DDB clients. A typo in a condition expression could bypass atomic state
transitions.

Suggestions (10 found)

S1. No concurrent approve + deny test (Test Analyzer, 8/10)

No test verifies that simultaneous approve and deny on the same request_id results in exactly one winning. The DDB condition guards make this safe, but the invariant is unpinned.

S2. _poll_for_decision consecutive-failure path untested (Test Analyzer, 8/10)

The POLL_MAX_CONSECUTIVE_FAILS (10 failures → timeout) safety path has no direct test. Only the all-PENDING timeout path is tested.

S3. ApprovalRecord should be a discriminated union on status (Type Analyzer)

A PENDING record has no decided_at/scope; an APPROVED record requires both. Currently all fields are optional on every status — illegal combinations are representable.

S4. Severity literal 'low' | 'medium' | 'high' repeated inline 6+ times (Type Analyzer)

Used across ApprovalRecord, PendingApprovalSummary, PolicyRuleSummary, ParsedRule, etc. A shared Severity type alias would prevent drift.

S5. formatMinuteBucket duplicated in 3 Lambda handlers (Code Reviewer)

Identical helper in approve-task.ts, deny-task.ts, and get-pending.ts. Should be extracted to a shared module.

S6. GetPendingFn has grantReadWriteData when scoped permissions suffice (Code Reviewer)

Least-privilege concern — the handler only needs read + scoped write for rate-limit counters, not full table write.

S7. approval_row: dict in task_state.py is untyped (Type Analyzer)

No schema validation on the dict shape passed to transact_write_approval_request. A TypedDict would catch missing fields at call sites.

S8. No CI enforcement for CDK/CLI type sync (Type Analyzer)

The sync contract relies on developer discipline. A structural diff or export-name comparison would formalize it.

S9. Triple-codebase constant sync for APPROVAL_GATE_CAP_MIN/MAX (Type/Code Reviewers)

Bounds constants exist in cdk/types.ts, cli/types.ts, and agent/policy.py with no automated drift detection. Acknowledged as deferred in the PR.

S10. _try_progress swallows all exceptions with only WARN logging (Silent Failure Hunter)

15+ call sites could mask TypeError bugs. No counter or escalation for repeated failures — the user's live event stream silently goes incomplete.

Strengths

Fail-closed posture is largely substantiated: evaluate_tool_use maps ALL unhandled exceptions to DENY. Approval write failures produce distinct error types with specific handling
paths.
IMPL-24 late-approval race handling is thorough: The four-branch re-read path (APPROVED wins, DENIED wins, still-PENDING, row-gone) is well-tested and well-documented.
Cross-engine Cedar parity contract is excellent: Shared golden fixtures tested by both cedarpy (Python) and cedar-wasm (TypeScript) with guard against empty fixture directories.
Approval metrics module (approval-metrics.ts) is standout: Pure functions, no-fallback-to-zero policy, cardinality-bounded dimensions, event-time timestamps. Best-designed module in
the PR.
Policy engine (policy.py) has strong invariants: Gate-cap bounds, base_permit validation, cross-tier rule-ID uniqueness, 64KB blueprint cap, built-in hard-deny protection.
Secret redaction tested before persistence: Deny handler verifies secrets are stripped before they reach DDB, not after.
Cap and rate-limit tests verify no DDB side-effects: Tests assert both the deny response and that write_calls == [], confirming circuit breakers short-circuit before mutations.

cdk-nag introduced AwsSolutions-COG8 ("The Cognito user pool is not on the plus tier / feature plan") which fires on the dev UserPool and fails ``cdk deploy`` at synth time. Same rationale as the existing COG3 suppression (advanced-security / Plus tier features are not required for the dev environment's CLI-based auth flow). Surfaced during a live deploy attempt; required to unblock any ``cdk deploy`` against this branch.

C1 (fail-closed gap on PreToolUse hook wrapper) — agent/src/hooks.py The ``_pre`` wrapper around ``pre_tool_use_hook`` had no try/except guard, unlike the parallel ``_post`` and ``_stop`` wrappers. If the inner hook raised an unhandled exception (asyncio cancellation, a malformed-payload TypeError, etc.), the SDK's default behaviour for an unhandled hook exception is undefined — we cannot trust it to fail closed. Mapping every uncaught exception to a DENY at the SDK boundary makes the security posture explicit and consistent with ``_post``/``_stop``. Includes ERROR + CloudWatch ``log_error_cw`` mirroring so the crash is visible in the dashboard, not just stdout. C2 (CLI missing Attachment + attachments field) — cli/src/types.ts ``cdk/src/handlers/shared/types.ts::CreateTaskRequest`` defines an optional ``attachments: Attachment[]`` field plus an ``Attachment`` interface; the CLI mirror had neither. Added both, mirroring the upstream shape exactly per the AGENTS.md "Shared API request/ response shapes must stay in sync" contract. C3 (CLI missing prompt_version on TaskDetail) — cli/src/types.ts Same sync contract: CDK's ``TaskDetail`` includes ``prompt_version: string | null`` but the CLI mirror omitted it. Added with matching ``readonly`` + null-safe shape. Two test fixtures (format.test.ts + format-status-snapshot.test.ts) updated to include the new required field. Tests: - agent: pytest tests/test_hooks.py — 55 passed (no regression on the existing hook surface). - cli: jest — 238 passed across 22 suites.

I1 (TIMED_OUT exception path bypassed IMPL-24 re-read) — hooks.py The ``best_effort_update_approval_status("TIMED_OUT")`` call's exception handler set ``wrote = True``, causing the IMPL-24 re-read path to be skipped on transient DDB errors. If the user's APPROVED decision had already landed on the row when the timer fired, the agent would falsely deny the tool call. Changed to ``wrote = False`` on exception so the ConsistentRead fallback always runs; the re-read is a no-op if the row is still PENDING (we keep TIMED_OUT) but honors a late APPROVED/DENIED if present. I2 (write_terminal precondition excluded AWAITING_APPROVAL) — task_state.py ``write_terminal`` only accepted RUNNING/HYDRATING/FINALIZING as prior statuses. If the container crashed while AWAITING_APPROVAL, the shutdown handler's terminal-write would fail ConditionalCheckFailed and the task would hang for ~2 hours until the stranded-task reconciler catches it. Added AWAITING_APPROVAL to the IN clause; the Cedar HITL state machine (design §9) explicitly allows RUNNING ↔ AWAITING_APPROVAL ↔ terminal transitions. Tests: pytest tests/test_hooks.py tests/test_task_state.py — 109 passed (no regressions on existing assertions; both fixes are defensive widenings of failure-path coverage).

I3 (task_state.py used print()) — agent/src/task_state.py All 14 ``print()`` calls migrated to ``shell.log()`` with appropriate severities. Bare prints route to container stdout, which AgentCore sends to ``runtime-DEFAULT`` rather than the APPLICATION_LOGS group that TaskDashboard widgets and ``bgagent status`` read — DDB write failures for approval state transitions were invisible to all monitoring. Severity mapping: - INFO for benign condition-skips (race-with-cancel, idempotent-replay) - WARN for best-effort write failures (heartbeat, session, terminal) - ERROR (via ``log_error_cw``) for the trace-S3-orphan path so the operator-actionable case lands in APPLICATION_LOGS, distinct from the noisy benign skips. I4 (scope-validation failures unlogged) — cdk/src/handlers/approve-task.ts ``parseApprovalScope`` failures returned 400 silently. Malformed scopes are security-relevant: they can indicate probing, CLI version drift, or downstream contract drift. Added ``logger.warn`` with structured fields (task_id, user_id, raw_scope, parser-error, request_id) so a CloudWatch Insights query can triage the cause. S10 (_try_progress swallowed all exceptions) — agent/src/hooks.py Repeated progress-write failures were emitting individual WARN lines with no escalation. A throttled DDB or IAM regression on the TaskEventsTable would silently break the user's live event stream. Added a per-method consecutive-failure counter (``method_name`` → count) that escalates to ``log_error_cw`` after 5 consecutive failures, then resets to avoid spam. Per-method state so a single failing method doesn't suppress visibility into other progress-write paths that may still be working. Tests: - agent: pytest tests/test_hooks.py tests/test_task_state.py — 109 passed. - cdk: jest test/handlers/approve-task.test.ts — 14 passed.

… IAM S5 (formatMinuteBucket duplicated in 5 handlers) — extracted to ``cdk/src/handlers/shared/rate-limit.ts``. The same identical ``yyyymmddhhmm`` UTC-bucket helper was inlined in approve-task, deny-task, get-pending, get-policies, and nudge-task — divergence here would silently invalidate per-minute rate-limit counters across handlers. Centralising it pins the bucket boundary. S6 (GetPendingFn over-permissioned) — replaced ``grantReadWriteData`` with two explicit ``PolicyStatement`` grants: ``dynamodb:Query`` on the table + GSI ARNs (the only read pattern the handler uses), and ``dynamodb:UpdateItem`` on just the table ARN (for the synthetic ``RATE#<user_id>#PENDING`` rate-limit row). Drops PutItem, BatchWrite, DeleteItem, and BatchGet permissions that were latent in the prior wildcard grant — none of which the handler ever exercises. Pinned to specific ARNs (not ``/*``). Tests: jest on the 5 touched handlers + task-api construct — 116 passed across 6 suites.

…ias + TypedDict S3 (ApprovalRecord allowed illegal status combos) — types.ts Refactored ``ApprovalRecord`` from a single interface with all decision-time fields optional into a discriminated union on ``status``. Pre-S3 a PENDING record could type-check with ``decided_at`` set, an APPROVED record could type-check without ``scope``, etc. New shape: - ``PendingApprovalRecord`` (status: 'PENDING') — no decision fields. - ``ApprovedApprovalRecord`` (status: 'APPROVED') — requires ``decided_at`` + ``scope``. - ``DeniedApprovalRecord`` (status: 'DENIED') — requires ``decided_at``, optional ``deny_reason``. - ``TimedOutApprovalRecord`` (status: 'TIMED_OUT') — requires ``decided_at``. - ``StrandedApprovalRecord`` (status: 'STRANDED') — requires ``decided_at`` (set by reconciler). Common fields (task_id, request_id, severity, etc.) live on a shared ``ApprovalRecordBase`` interface; the public ``ApprovalRecord`` is the union. Callers gain TS narrowing on ``status`` plus exhaustiveness checking — illegal combinations no longer compile. S4 (Severity literal repeated 6+ times) — types.ts, cedar-policy.ts, get-pending.ts, cli/types.ts Extracted ``export type Severity = 'low' | 'medium' | 'high'`` to ``cdk/src/handlers/shared/types.ts`` (canonical) + ``cli/src/types.ts`` (mirrored per the CDK↔CLI sync contract). Replaced inline literals in ``ApprovalRecord``, ``PendingApprovalSummary``, ``PolicyRuleSummary``, ``ParsedRule.severity``, and ``coerceSeverity``'s return type. A single source of truth means severity additions land in one place. S7 (approval_row was bare dict) — agent/src/task_state.py Introduced ``ApprovalRow(TypedDict)`` as the typed contract for ``transact_write_approval_request``'s third argument. Mirrors the TS ``ApprovalRecordBase`` field set so callers get static verification of the row shape before it hits DDB. Pre-S7 a missing or misspelled field would only fail at runtime via the DDB schema check. Tests: - agent: pytest tests/test_hooks.py tests/test_task_state.py — 109 passed. - cdk: jest test/handlers/{approve-task,deny-task,get-pending}.test.ts + test/handlers/shared/* — 643 passed across 29 suites. - Both packages pass ``tsc --noEmit`` with no new errors.

Replaces developer discipline with automated enforcement of the AGENTS.md "Shared API request/response shapes must stay in sync" contract. scripts/check-types-sync.ts (new) TypeScript-compiler-API-based comparator. For every export name that exists in BOTH cdk/src/handlers/shared/types.ts and cli/src/types.ts, normalizes its structural shape (sorted member names + types, sorted union members, normalised whitespace) and asserts they match. Two allowlists make intent explicit: - ``CDK_ONLY_ALLOWLIST`` — server-only persistence shapes (TaskRecord, every ApprovalRecord variant, NudgeRecord, EventRecord, WebhookRecord), notification-config types, and bound constants whose runtime source of truth lives in agent/src/policy.py. - ``CLI_ONLY_ALLOWLIST`` — CLI on-disk config (CliConfig, Credentials), response envelopes the API doesn't type explicitly (Pagination, ErrorResponse), and CLI display contracts (ErrorClassification, ErrorCategoryType, TaskEvent). Re-export declarations (``export type { Foo }`` from a sibling module) are recognised so canonical types declared elsewhere (TaskStatusType in cdk/src/constructs/task-status.ts) can be mirrored without duplicating the structural source of truth. Drifts found and fixed by this commit: - TaskDetail.status / TaskSummary.status — CDK had ``TaskStatusType``, CLI had bare ``string``. Mirrored ``TaskStatusType`` literal union into cli/src/types.ts so both surfaces narrow on the API's known statuses. - CreateTaskRequest.attachments — CDK had ``Attachment[]``, CLI had ``readonly Attachment[]``. Adopted the stricter readonly on CDK to match. - CreateTaskRequest.initial_approvals — CDK had ``readonly string[]``, CLI had ``readonly ApprovalScope[]``. Adopted the narrower union on CDK; the agent's ``parse_approval_scope`` already requires the same shape. - TaskStatusType is now re-exported from cdk/src/handlers/shared/types.ts so the script can locate the canonical declaration. Wiring: - ``mise run check:types-sync`` task in mise.toml. - prek pre-commit hook ``types-sync-cdk-cli`` that runs only when ``cdk/src/handlers/shared/types.ts``, ``cli/src/types.ts``, or the script itself changes — fast skip on unrelated edits. Tests: cli jest — 238 passed across 22 suites; cdk + cli tsc clean. ``mise run check:types-sync`` reports OK (45 CLI exports validated against 48 CDK exports, 18 server-only allowlisted).

…failures Closes I5 + S1 + S2 from the PR review. I5 (DDB transactions tested only via _FakeTaskState in hook tests) agent/tests/test_task_state.py The existing TestTransactWriteApprovalRequest / TestTransactResumeFromApproval suites verified call args but did not pin the exact ConditionExpression strings — the actual fail-closed enforcement layer. Added two focused tests: - ``test_exact_condition_expressions_pinned`` — asserts the Put's ``attribute_not_exists(request_id)`` and the Update's ``#s = :running`` precondition; explicitly asserts that ``:awaiting`` is NOT a precondition for the initial gate-write (so a runaway gate cascade can't re-pause an already-paused task). - ``test_condition_failed_reason_includes_both_branches`` — asserts both cancellation reasons propagate so the hook's error-classification can distinguish "request_id collision" from "task moved past RUNNING". - ``test_resume_condition_pins_joint_invariant`` — asserts both halves of the resume's joint ``status = AWAITING_APPROVAL AND awaiting_approval_request_id = :rid`` condition are present and that ``awaiting_approval_request_id`` is REMOVE'd (not just overwritten) on resume. S1 (no concurrent approve+deny test) cdk/test/handlers/approval-concurrency.test.ts (new file) Pins the at-most-one-decision invariant for the two handler entry points: simulating two near-simultaneous approve+deny requests on the same ``request_id``, the test asserts that the second writer receives ConditionalCheckFailed (mapped to 404 / 409) while the first succeeds with 202. Both directions covered (approve-then- deny and deny-then-approve) so a future change that broke ONLY one direction (e.g. by accidentally omitting the condition on the deny path) is caught. S2 (_poll_for_decision consecutive-failure path untested) agent/tests/test_poll_for_decision.py (new file) Four targeted tests: - ``test_n_consecutive_failures_returns_timed_out_with_reason`` — injects ``POLL_MAX_CONSECUTIVE_FAILS`` (10) consecutive ``RuntimeError`` from ``ts.get_approval_row``; asserts the outcome is TIMED_OUT with a non-null ``reason`` containing "consecutive". Distinguishes the infra-failure path from the deadline-fired path (which has ``reason == None``). - ``test_degraded_emitted_once_at_threshold`` — asserts ``approval_poll_degraded`` fires exactly once at ``POLL_DEGRADED_FAILS`` (3), not on every subsequent poll. - ``test_recovery_resets_failure_counter`` — 5 fails then a PENDING then APPROVED; the counter must reset on recovery so intermittent blips don't accumulate to the timeout threshold. - ``test_deadline_beats_failures_when_timeout_short`` — short timeout fires the deadline-path ``reason == None`` outcome even when polls would otherwise succeed. Tests: - agent: 116 passed (test_task_state + test_hooks + test_poll_for_decision; was 109 baseline, +4 poll +3 task_state). - cdk: approval-concurrency test_suite — 2 passed.

Aggregates hooks-suggested fixes that surfaced after the seven thematic PR-review-response commits ran through the full prek pre-commit and pre-push gates: - agent/Dockerfile — added ``apt-get upgrade -y --no-install-recommends`` early in the apt block. Without this, debian-base CVEs ride the ``python:3.13-slim`` tag until upstream rebuilds. Specifically cleared GHSA libnghttp2 CVE-2026-27135 (HIGH) flagged by trivy on the built image. Narrow upgrade — only already-installed packages get bumped. - yarn.lock — bumped transitive ``devalue ^5.6.3`` from 5.8.0 → 5.8.1 to clear GHSA-77vg-94rm-hx3p (HIGH, CVSS 7.5). Astro pulls devalue; the existing ``^5.6.3`` constraint accepts 5.8.1 already. - agent/src/task_state.py — split a long log() call across multiple lines to satisfy ruff E501 line-length (101 → ≤100). - agent/tests/test_task_state.py — same line-length fix on a class method signature; cast the deliberately-malformed ``bad_row`` in ``test_unsupported_row_type_rejected`` to ``Any`` so the new TypedDict (S7) doesn't reject the test at compile time. The test exists to verify the runtime guard, which static-typing would otherwise pre-empt. - agent/tests/test_poll_for_decision.py — eslint-style trailing-blank cleanup applied by prek. - cdk/src/handlers/shared/cedar-policy.ts — eslint auto-sorted the imports such that ``import type { Severity }`` ended up between the ``// eslint-disable-next-line`` comment and the ``require()`` it was meant to suppress, breaking the suppression and failing the hook. Moved the type import above the disable comment. - cdk/src/handlers/{approve,deny,get-pending,get-policies,nudge}-task.ts — trailing-newline normalisation from prek's ``fix end of files``. All 19 hook steps green: pre-commit (12) + pre-push (7). Zero CVEs.

scoropeza · 2026-05-15T21:48:16Z

Thanks for the detailed automated review @krokoko! Addressed in 7 thematic commits + 1 chore fixup on top of 2d11918. Push: 2d11918..72e614a.

ID	Status	Commit	Notes
C1	✅ Fixed	`77ff903`	`_pre` wrapper now has try/except like `_post`/`_stop`; uncaught exceptions map to fail-closed deny via `_deny_response`.
C2	✅ Fixed	`77ff903`	`Attachment` interface + `attachments?: readonly Attachment[]` field added to `cli/src/types.ts`.
C3	✅ Fixed	`77ff903`	`prompt_version: string \| null` added to CLI's TaskDetail; two test fixtures updated.
I1	✅ Fixed	`3060316`	`wrote = False` (was `True`) on TIMED_OUT exception so the IMPL-24 re-read always runs. Comment explains the late-approval hand-off.
I2	✅ Fixed	`3060316`	`AWAITING_APPROVAL` added to `write_terminal`'s `ConditionExpression IN (…)` clause. Cedar-HITL state machine §9 explicitly allows `AWAITING_APPROVAL → terminal`.
I3	✅ Fixed	`7c65704`	All 14 `print()` in `task_state.py` migrated to `shell.log()` with severity-appropriate level (INFO benign skips / WARN best-effort failures / ERROR via `log_error_cw` for the trace-S3-orphan path that's operator-actionable).
I4	✅ Fixed	`7c65704`	`logger.warn` with structured fields (`task_id`, `user_id`, `raw_scope`, `error`) added before the 400 return in `approve-task.ts`.
I5	✅ Fixed	`e79ac4b`	New tests pinning the exact ConditionExpression strings on `transact_write_approval_request` (Put + Update preconditions) and `transact_resume_from_approval` (joint `status = AWAITING_APPROVAL AND awaiting_approval_request_id = :rid`).
S1	✅ Fixed	`e79ac4b`	New `cdk/test/handlers/approval-concurrency.test.ts` covers both directions (approve-then-deny + deny-then-approve); pins the at-most-one-decision invariant via `ConditionalCheckFailedException` mocking.
S2	✅ Fixed	`e79ac4b`	New `agent/tests/test_poll_for_decision.py`: 4 tests covering `POLL_MAX_CONSECUTIVE_FAILS` infra-failure path (with distinct `reason`), `POLL_DEGRADED_FAILS` once-only signal, recovery-resets-counter, and the deadline-vs-failures distinction.
S3	✅ Fixed	`238b23f`	`ApprovalRecord` refactored to a discriminated union (Pending/Approved/Denied/TimedOut/Stranded variants). Illegal status combinations no longer compile.
S4	✅ Fixed	`238b23f`	`Severity` type alias added to `cdk/src/handlers/shared/types.ts` and `cli/src/types.ts`; 5 inline literals replaced.
S5	✅ Fixed	`d2e343e`	`formatMinuteBucket` extracted to `cdk/src/handlers/shared/rate-limit.ts`; 5 callsites updated.
S6	✅ Fixed	`d2e343e`	`GetPendingFn` IAM tightened to explicit `dynamodb:Query` (table + GSI ARNs) + `dynamodb:UpdateItem` (table only). Drops latent PutItem/DeleteItem/BatchWrite from the prior `grantReadWriteData` wildcard.
S7	✅ Fixed	`238b23f`	`ApprovalRow(TypedDict)` introduced for `transact_write_approval_request`'s row argument.
S8	✅ Fixed	`fe1e64f`	New `scripts/check-types-sync.ts` + `mise run check:types-sync` task + prek pre-commit hook. Compares all exported interfaces / type aliases / consts between CDK and CLI types files. CDK_ONLY / CLI_ONLY allowlists explicit. Surfaced and fixed 3 pre-existing drifts (`TaskStatusType` narrowing, `attachments` readonly, `initial_approvals` ApprovalScope[]).
S9	🟡 Discussion	(next comment)	Filing as a follow-up — the triplication is real but the fix has a real design surface. See dedicated comment below.
S10	✅ Fixed	`7c65704`	`_try_progress` now tracks per-method consecutive-failure counts and escalates to `log_error_cw` after 5 consecutive failures, then resets to avoid spam.

Test coverage delta

Agent pytest: 756 → 763 (+7 — 4 poll, 3 task_state)
CDK jest: 1526 → 1528 (+2 — concurrency suite)
CLI jest: 238 (no count change; 2 fixture-only tests updated for prompt_version)
All tsc --noEmit clean on both packages
mise run check:types-sync reports OK (45 CLI exports validated against 48 CDK exports, 18 server-only allowlisted)
Full pre-commit + pre-push hooks (19 steps) green; zero CVEs

72e614a is a follow-up chore fixup that aggregates lint-style + small dependency bumps the hooks surfaced (debian libnghttp2 base-image CVE via apt-get upgrade in the Dockerfile, transitive devalue 5.8.0 → 5.8.1, ESLint import-order conflict with a require() suppression, ruff line-length).

Filing S9 as a separate comment because it has actual design trade-offs.

scoropeza · 2026-05-15T21:48:39Z

On S9 (APPROVAL_GATE_CAP_MIN/MAX triplication) — wanted to call this out separately because the fix has real design trade-offs and I'd value your read.

The constants live in three places, each with a real reason:

agent/src/policy.py — Python runtime, validates at PolicyEngine construction.
cdk/src/handlers/shared/types.ts — TypeScript, validates at submit-time in CreateTaskFn.
cdk/src/constructs/blueprint.ts — TypeScript, validates at construct-instantiation time. Has an explicit comment: "Kept here (not imported from handlers/shared/types.ts) so the construct layer does not pull in runtime-side types".

Cross-language sharing has multiple defensible answers:

Option A — TS-only canonical, codegen Python at build time. Real codegen tooling, but pinned at deploy. New step in mise run install / build pipeline.

Option B — Build-time assertion script. Like the new S8 type-sync check but for constants: read all 3 files, fail CI on mismatch. No codegen, no single source, but drift detection. Probably cheapest to land — could literally extend scripts/check-types-sync.ts with a "constants block" that compares numeric literals across the three files.

Option C — Move bounds out of all 3 files into contracts/constants.json. Same pattern as contracts/cedar-parity/. Each language reads the JSON at its appropriate phase (policy.py at import, blueprint.ts via JSON.parse(fs.readFileSync(...)) at construct init, types.ts via import of .json if resolveJsonModule is on).

Option D — Keep triplicated, accept drift. Rely on E2E tests catching divergence. (Status quo + a comment.)

My lean is C because it generalizes for the next "shared-across-languages constant" we hit (likely sooner than later — APPROVAL_TIMEOUT_S_MIN/MAX have the same shape, as does INITIAL_APPROVALS_MAX_ENTRIES), and the contracts/ directory pattern already exists. B is faster to land if you'd rather defer the architectural choice. Either way, this is a real ~half-day of work + tests + docs.

Happy to follow whichever direction you prefer; can take this on after PR landing as a separate issue (I'll file it tagged with this thread).

krokoko · 2026-05-15T23:39:47Z

"can take this on after PR landing as a separate issue (I'll file it tagged with this thread)." -> fine by me

Closes the S9 triplication called out in PR aws-samples#88's design discussion (issuecomment-4463943269 — Option C). The bounds now live in a single JSON contract consumed by Python (agent/src/policy.py at import), TypeScript runtime (cdk/src/handlers/shared/types.ts via resolveJsonModule), and TypeScript synth (cdk/src/constructs/blueprint.ts). The AgentCore Docker build context moves from agent/ to repo root so the Dockerfile can COPY contracts/ alongside agent/. A new repo-root .dockerignore keeps the asset lean (cdk/cdk.out/, node_modules/, build outputs excluded). scripts/check-constants-sync.ts is a dedicated drift check (mise task + prek hook) that rejects any re-introduction of literal assignments for the owned names in agent/src/policy.py. The TS side is enforced by tsc via resolveJsonModule.

Absorbs 8 upstream commits (883a1c6, 47caaf9, 3a06c20, 8ad11d1, 3b0fc19, c3b1cdd, 992e6bd, 5d9748f) and resolves two areas: - agent/Dockerfile: dropped upstream's redundant `apt-get upgrade libnghttp2-14` (aws-samples#101) because our generic `apt-get upgrade -y --no-install-recommends` from 72e614a is a strict superset (per feedback_dockerfile_apt_upgrade.md). - yarn.lock: regenerated via `yarn install` after taking upstream's version, picking up both their devalue bump (aws-samples#94) and our @aws-sdk/client-bedrock-agentcore version. All CDK + agent tests green; full prek hook stack passes.

scoropeza added 26 commits May 6, 2026 20:38

scoropeza marked this pull request as ready for review May 14, 2026 17:03

scoropeza requested a review from a team as a code owner May 14, 2026 17:03

scoropeza added 8 commits May 15, 2026 13:33

scoropeza force-pushed the feature/cedar-hitl-engine branch from dc6164d to 0a21652 Compare May 16, 2026 01:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Cedar HITL approval gates for agent tool use#88

feat: Cedar HITL approval gates for agent tool use#88
scoropeza wants to merge 37 commits into
aws-samples:mainfrom
scoropeza:feature/cedar-hitl-engine

scoropeza commented May 13, 2026 •

edited

Loading

Uh oh!

krokoko commented May 15, 2026

Uh oh!

scoropeza commented May 15, 2026

Uh oh!

scoropeza commented May 15, 2026

Uh oh!

krokoko commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

scoropeza commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's new (user-visible)

User-facing documentation

Safety posture

Notable merge surgery

Test plan

Deferred / follow-up

Uh oh!

krokoko commented May 15, 2026

Uh oh!

scoropeza commented May 15, 2026

Test coverage delta

Uh oh!

scoropeza commented May 15, 2026

Uh oh!

krokoko commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

scoropeza commented May 13, 2026 •

edited

Loading